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(57) Abstract: An algorithm has been developed to identify four DNA sequences of 20 bases or more that form a structure called a 
conneclron. Two sequences C 1 and C2 are adjacent to each other. These sequences are expressed as RNA in the 3'UTR of some genes 
in many prokaryotic, archea and eukaryotic genomes. The other half of a connectron is two DNA sequences Tl and T2 that are on 
the same chromosome and range in distance from each other by about lkb to 105kb. The CI sequence is identical to the Tl sequence 
and the C2 sequence is identical to the T2 sequence. C1/C2 and T1-T2 can be on different chromosomes. The C1/C2 RNA sequence 
of the gene transcript finds the two double-stranded DNA sequences Tl and T2. The single-stranded RNA and double-stranded 
DNA then form a triple- stranded Hoogsteen helix of the RNA/DNA/DNA variety. Because the CI sequence is adjacent to the C2 
sequence, the Tl sequence is made spatially adjacent to the T2 sequence in a compact X-shaped structure. Chromatin particles form 
as compact 30nm assemblies in the DNA between Tl and T2 thus eliminating the intervening genes from promotion and expression. 
Connectrons remove sets of genes from expression and thus modulate the behavior of many types of cells. 
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ALGORITHMIC DETERMINATION OF FLANKING DNA SEQUENCES 
THAT CONTROL THE EXPRESSION OF SETS OF GENES IN 
PROKARYOTIC, ARCHEA AND EUKARY OTIC GENOMES 



5 Reference to Related Application 

The present application is the subject of Provisional Application Serial No. 
60/208,650 filed June 2, 2000 entitled ALGORITHMIC DETERMINATION OF 
CONNECTRONS FOR THE HIGH LEVEL REGULATION OF GENE 
EXPRESSION. 

10 Introduction 

RNA introduced into a cell by a virus is now known to trigger a cellular defense 
mechanism known as post-transcriptional gene silencing (PTGS). If the viral RNA 
sequence matches a sequence within the cell's genome the associated genes are turned 
off or silenced. This phenomenon is also called 'RNA interference' or RNAi. A 

15 single-stranded RNA can interact with another single-stranded RNA (known as 

antisense RNA). The single-stranded RNA can also form a triple-stranded complex 
with double-stranded DNA. This triple-stranded complex is known as a Hoogsteen 
helix. This patent application shows how two specific adjacent RNA single-stranded 
sequences (called CI and C2 - for Control Sequence 1 and Control Sequence 2) 

20 interact with two distant double-stranded DNA sequences (called Tl and T2 - for 

Target Sequence 1 and Target Sequence 2) to form a tetradic relationship which is 
called a "connectron". The two distant DNA double-stranded sequences (Tl and T2) 
must be on the same chromosome in a genome and they must be between about 1 kb 
and 105kb of each other. The adjacent single-stranded RNA sequences (C1/C2) can 

25 be on the same or different chromosome as the Tl and T2 sequences. The CI 

sequence is identical to the Tl sequence and the C2 sequence is identical to the T2 
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sequence. The connectron acts to stabilize the double-stranded DNA by allowing 
30nm chromatin particles to form. Genes that lie between the Tl and T2 sequences 
when wrapped up in 30nm chromatin particles are not open to promotion and 
expression. The connectron (i.e. the tetradic relationship between the T1-T2 
5 sequences and C1/C2 sequences) provides a general explanation for PTGS. A 

connectron can implemented by RNA sequences, PNA (Peptide Nucleic Acid) 
sequences or by a zinc-finger DNA Binding Protein (DBP) specific to the Tl and T2 
sequences. 

Characteristically the adjacent C1/C2 sequences lie in the 3'UTR of a gene. The Tl 
10 and T2 sequences do not lie within the translated region of any gene. These 

sequences "surround" one or more genes. There are, however, Tl and T2 sequence 
pairs that surround one or more C1/C2 sequences that are not 3'UTR to any gene. 
These are called "geneless connectrons". There may be promoter sequences that 
cause the transcription of these 3'UTR sequences. 

15 A computer-based algorithm that is similar to the algorithm used in the US Patent 

6,205,404 has been developed to determine the connectron structure of any genome. 
This algorithm determines the existence of all the connectrons in the genomic DNA. 
Connectrons exist in prokaryotes, archea, single-celled eukaryotes, multi-celled 
eukaryotes, plants and higher animals. Connectron relationships exist between 

20 prokaryotes and their plasmids. The geneless connectrons provide a possible 

mechanism for forming a hierarchy of gene expression control that will produce an 
understanding of cell differentiation and tissue development. 

Each connectron is a unique tetrad of sequences. Each connectron changes the 
expression of the genes between the Tl and T2 sequences. The CI sequence (which is 
25 equivalent to the Tl sequence) and the C2 sequence (which is equivalent to the T2 

sequence) are determined by the invention described in this patent application. In 
general, the tetrad of connectron sequences can be patented because the structure of 
matter is known and the function of specific gene expression modulation is also 
known. Gene expression modification can be produced by introducing antisense 
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RNA or PNA to interact C1/C2 RNA* sequences or zinc-finger DBPs to interact with 
the Tl and T2 sequences. Using connections it will be possible to modify cellular 
and tissue behavior in a very general manner. 

Examples will be given from different genomes to illustrate that the connection is a 
perfectly general and universal concept. 



Definitions 

Double stranded DNA - Watson and Crick showed in 1953 that DNA naturally forms 
a double-stranded helix. A typical double stranded sequence is 

5 '-TAG AGG AGTACC AC-3 * 
3 • - ATCTCCTC ATGGTG- 5 ' 



Hydrogen Bond - The force between a hydrogen atom and another heavier atom such 
20 as Oxygen (O), Nitrogen (N), Phosphorus (P), or Sulfur (S). 

Positive strand - The positive strand is normally represented 5' to 3' running left to 
right as in 

25 5 , -TAGAGGAGTACCAC-3 • 

Negative strand - The negative strand is normally represented 5' to 3' running right to 
left as in 

30 3 • - ATCTCCTCATGGTG-5 ' 
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Single stranded RNA - Either the positive or the negative strand of the double- 
stranded DNA can be transcribed by the polymerase. In RNA U replaces T. 

RNA of positive strand sequence 5'-UAGAGGAGUACCAC-3' 
RNA of negative strand sequence S'-GUGGUACUCCUCUA-S' 

Antisense RNA - The antisense strand of any RNA sequence is the compliment 
sequence 

RNA sequence 5'-UAGAGGAGUACCAC-3' 
Antisense RNA sequence 3 '-AUCUCCUCAUGGUG-5 ' 



Triple Strand Helix - The RNA sequence of a RNA/DN A triple-strand complex is the 
15 same as the positive strand of the DNA p. 

DNA positive strand 5 '-TAGAGGAGTACCAC-3 5 

DNA negative strand 3 '-ATCTCCTCATGGTG-5 ' 

RNA strand 5 '-UAGAGGAGUACCAC-3 ' 

20 

Promoter - Any region of DNA, that binds proteins which engage the polymerase 
transcription mechanism. 

TATA Box - A region near the 3' end of a promoter with the sequence TATA. 

25 

mRNA - The RNA produced from the DNA by the polymerase as a result of 
transcription 

Start of transcription - The 3' end of a promoter where the polymerase mechanism 
30 begins to transcribe DNA into mRNA. 

Exon - Any region of mRNA which is used to code for proteins 
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Intron - Any region of mRNA lying between two exons which is not used to code for 
proteins. The introns are edited out of the initial RNA transcript to form the mature 
mRNA. 

5 

3 5 UTR - The untranslated 3 5 end of an mRNA is beyond the end of the last exon. A 
stop codon in the mRNA causes the ribosome to stop the translation of mRNA into 
protein. 

10 End of translation - The 3' end of the 3 '-most exon. 

Translated region - Any collection of exons and introns. 

Gene - Any DNA region that codes for a protein. Introns do not occur in prokaryotic 
15 genes and they sometime fail to occur in eukaryotic genes. A typical model of a gene 

is 

|< Promoter >| 

|<-TATA Box->| 
20 ]<-Beginning of Translation 

|<- Translated Region >| 

End of Translation->| 
|<-Exon->|<-Intron->|<-Exon->|<-Intron->|<-Exon->|<-3'UTR->j 

+ strand 

25 - strand 

|< Gene ->| 

Positive strand gene - Any gene in which the features run 5' to 3' on the positive 
30 strand 

Negative strand gene - Any gene in which the features run 5' to 3' on the negative 
strand 

35 CI sequence - Any positive or negative strand DNA sequence of 20 bases or more. 
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The C2 sequence must occur in the same chromosome as the CI sequence. 

C2 sequence - Any positive or negative strand DNA sequence of 20 bases or more. 
The CI sequence must occur in the same chromosome as the C2 sequence. 

5 

C1/C2 - Any positive or negative strand DNA sequence of 40 or more bases such that 
the CI sequence is adjacent to the C2 sequence 

Tl sequence - Any positive or negative strand DNA sequence of 20 bases or more 
10 that is on the same chromosome as the T2 sequence. The Tl and T2 sequences must 

be between about Ikb and 105kb apart. 

T2 sequence - Any positive or negative strand DNA sequence of 20 bases or more 
that is on the same chromosome as the Tl sequence. The T2 and Tl sequences must 
15 be between about lkb and 105kb apart. 

Last exon gap or Gap-Distance - The number of bases between the end of 
transcription and the beginning of the C1/C2 sequence. In prokaryotes and single- 
celled eukaryotes this gap can range from no bases to 500 bases. In multi-celled 
20 eukaryotes the gap can be as large as 10,000 bases. 

Poly-adenylation signal - A number of Adenosine (A) bases are added to the mRNA 
at the end of the 3 'UTR. 

25 Possible Connectron - Any set of Tl, T2 and C1/C2 sequences such that the CI 

sequence is identical to the Tl sequence and the C2 sequence is identical to the T2 
sequence. The promoter of some gene causes the mRNA of the gene to be expressed. 
The mRNA is edited to eliminate the introns. The whole mRNA including the 3'UTR 
can move about in the cell or the nucleus of the cell. The C1/C2 RNA that is part of 

30 the 3'UTR moves to the Tl and T2 DNA sequences. A triple-stranded complex of 

the DNA and the RNA forms such that the CI sequence forms hydrogen bonds with 
the Tl sequence and the C2 sequence forms hydrogen bonds with the T2 sequence. 

-6- 
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Because the CI sequence is adjacent to the C2 sequence, the Tl sequence is brought 
physically close to the T2 sequence. This produces a loop of between about lkb and 
105kb in the DNA. Histone proteins reduce the length of the DNA by binding 200 
bases. Histone/DNA complexes form six-fold symmetry chromatin assemblies. The 
5 diameter of the chromatin assemblies is approximately 30nm. 

Real Connectron - Any Possible Connectron which is within the Gap-Distance of 
some gene 

10 Homologous connectron — The Tl sequence and the T2 sequence are on the same 

chromosome as the C1/C2 sequence 

Heterologous connectron - The Tl sequence and the T2 sequence are on a 
chromosome different from chromosome of the C1/C2 sequence 

15 

Permanent connectron - Any C1/C2 sequence, which is 3' UTR to some gene that is 
not surrounded by any Tl and T2 sequence pairs 

Transient connectron - Any C1/C2 sequence, which is 3' UTR to some gene that is 
20 surrounded by one or more Tl and T2 sequence pairs 

Self-limiting connectron - Any C1/C2 sequence which is 3 'UTR to some gene that is 
surrounded by the Tl and T2 sequences such that C1=T1 and C2=T2 

25 Geneless connectron - Any C1/C2 sequence which is not 3 'UTR to some gene but is 

surrounded by some Tl and T2. A promoter may lie 5 5 to the C1/C2 sequence. 

Bidirectionality of Connectron Excitation - A C1/C2 short loop on one strand selects 
a T1-T2 long loop pair on the same or the opposite strand. The C1/C2 short loop has 
30 a complementary C17C2' sequence on the opposite strand. Similarly the T1-T2 long 

loop pair has a complementary long loop pair TP-T2\ Wherever a C1/C2, T1-T2 
tetrad exists there is a complementary C17C2 5 , Tl'-T2' tetrad. The C1/C2 short loop 
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can be transcribed as a 3'UTR to a gene on the same strand. The CI 7C2' short loop 
which is on the strand opposite to the C1/C2 short loop can also can be transcribed as 
a 3'UTR to a gene on the same strand. There are four possible models of action 
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Tl T2 gene - C1/C2 
- strand 



Tl T2 
+ strand 



10 C2/Cl-gene 



+ strand 
- strand - 



15 TV TV C27Cr-gene 

gene-C17C2' 



+ strand 
- strand - 



20 T2* Tl' 

Of course, the short loops and the long loops do not have to be on the same 
chromosome. 

25 Hierarchy of connectron action - When a C1/C2 is expressed it forms a T1-T2 loop 

by forming a connectron. The C1/C2 sequence does not have to be on the same 
chromosome as the Tl and T2 sequences. This provides a way of causing interaction 
between chromosomes. When the T1-T2 loop forms, any genes in that loop region 
which had been expressing C1/C2 sequences in their 3'UTRs, now cease expressing 

30 the C1/C2 sequences. The connectrons formed by these C1/C2 sequences will cease 

to exist after some time thus opening up the genes inside the respective T1-T2 loops 
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to expression. The hierarchy of connectron action is alternates between repression 
and expression. The connectron hierarchies can be of any depth. 

One-to-Many connectron action - One C1/C2 sequence can form connectrons in 
5 many different places on many different chromosomes. The only requirement is that 

C1=T1 and C2=T2. This makes it possible for one expression event to control the 
expression of many genes on different chromosomes. 

Many-to-One connectron action - Cl/C2s that come from many different places on 
10 many different chromosomes can form a connectron for a specific T1-T2 sequence 

pair. The only requirement is that Cl^Tl and C2=T2. This makes it possible for 
many different expression events to control the expression of one set of genes on a 
particular chromosome. 

15 Many-to-Many connectron action - The arrangement of Cl/C2s and Tl-T2s across 

chromosomes can form a complex web of gene expression control relationships. 

Percentage of the Genome Regulated by Connectrons - Since the connecti ons for a 
sequenced genome can be calculated, the percentage of the genome that is open to 
20 connectron regulation can be known. 

Emergent Property - The network of connectrons in any genome emerges fi'om a 
knowledge of the complete DNA sequence of the genome. Because both the C1/C2 
sequences and the T1-T2 sequences can be any place in the genome, the whole 
25 genomic sequence must be known before all the connectrons can be determined. 

Paradigm Shift - For the past fifty years since the discover}' by Watson and Crick of 
the double-helical nature of DNA, the reigning paradigm for scientific discovery has 
been the study of one gene and its effects on the behavior of a cell. The advent of 
30 genomic sequencing and this invention of connectrons that emerge from the whole 

genome will produce a shift in the way scientists view biological systems and the way 
they formulate and execute experiments. The many-to-many relationships between 
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the connectrons means that there are many ways in which the expression of a set of 
genes can be modulated. The multiplicity of control pathways means produces a 
system stability that makes it possible for biological systems to be stable for long 
periods of evolutionary time. The thinking that goes into formulating scientific 
5 experiments will have to change to accommodate the changes in understanding that 

will be induced by the application and extension of this patent application. 

Hierarchy of DNA Structuring — The DNA of a cell's genome is structured in a 
hierarchy of six levels. Figures 1, 2 and 3 have been adapted from The Molecular 

10 Biology of the Cell by Alberts, Bray, Lewis, Raff, Roberts and Watson [third edition 

pages 354, 345 and 348]. As shown in figure 1, the double stranded DNA is level 1. 
The double-stranded DNA is wrapped around histone proteins to form a chromatin 
particle that is level 2 of the hierarchy. Level 2 is described as "beads-on-a-string" in 
figure 1. The chromatin particles are packed in a six-fold symmetry as shown in 

15 figure 2a and figure 2b. These six-fold assemblies have a diameter of 30 nm. Each 

30 nm assembly contains from 18 (i.e. 6 * 3) to 30 (i.e. 6*5) chromatin particles. 
The 30 nm assemblies aggregate into large loops which range in length from 5,000 
bases to 100,000 bases of DNA. The size of these large loops as shown in figure 1 is 
approximately 300 nm. These large loops constitute level 4 of the structuring 

20 hierarchy. As shown in figure ], level 5 of the DNA structuring hierarchy many large 

loops are condensed to form a structure which is approximately 700 nm in diameter. 
The complete chromosome that constitutes level 6 of the hierarchy is composed of 
two very long sections of level 5 DNA. 

25 Model of Chromatin Structure - The level 4 structure of DNA as shown in figure 1 

ranges in length from 5,000 to 105,000 bases of DNA. Figure 3 shows that proteins 
are thought to connect portions of the long loops formed by the 30 nm particles to 
form a chromosome axis. These condensed long loops are described as chromomeres 
in The Molecular Biology of the Cell. 
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Prior Art 

The chromomere model of DNA structuring was presented by N. A Resnik, et al.[l] 
and is based on electron microscopic data. There are more recent papers studying a 
5 variety of genomes with electron microscopy but no equivalent study of chromomeres 

has been done on a fully sequenced genome. 

A recent News Feature in Nature by T. Gura [2] described the discovery of post- 
transcriptional gene silencing in which viral RNA interacts with the transcribed RNA 
of the cell to silence the expression of genes. This article describes experiments in C. 
10 elegans and D. megalomaster in which RNA that is complementary to mRNA 

introduced into a cell. This "antisense" RNA has the effect of turning off the 
expression of one or more genes. The introduced complementary RNA produces an 
"RNA interference" called RNAi. 

Thomas Werner and his colleagues at Genomatix in Munich, Germany have 
15 developed an approach to understanding what they call "Matrix Attachment Region" 

(MAR). Figure 5 shows their interpretation of the structure of DNA surrounding a 
gene. The following description of the MAR is copied from the Genomatrix web site 

"Matrix Attachment Regions (MARs) MARs are sequence regions that are 
20 responsible for the attachment of genomic DNA to the nuclear matrix or scaffold. 

Transcription absolutely requires anchorage of genomic DNA to the nuclear matrix. 

Functional features of MARs: 

25 Anchoring of regulatory elements like promoters and enhancers to the nuclear 

matrix. 

Ensuring long term activity of promoters and enhancers in chromatin. 
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Insulation, rendering a functional domain insensitive to position effects. 

Genomatix is conducting a research project to define and detect MARs by computer- 
analysis." 
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Brief Description of the Objects of the Invention 

An object of the invention is to provide a method of identifying DNA sequences that 
5 control the expression of different collections of genes in a genome comprising, 

detecting selected DNA sequences adjacent to some genes excluding exons and 
introns. 

An object of the invention is to provide a method of identifying DNA sequences that 
10 control the expression of different collections of genes comprising, detecting, by 

computer, one or more pairs of non-adjacent DNA sequences to which are bound to 
two RNA sequences. 

An object of the invention is to provide a method of identifying DNA sequences that 
15 control the expression of different collections of genes in a genome comprising 

detecting changes in connectron behavior in the genome. 

An object of the invention is to provide a method of modifying the expression of 
different gene collections in a genome, comprising detecting changes in connectron 
20 behavior as a result of an exogenous stimulus. 

An object of the invention is to provide a method of detecting where and when new 
genes are being integrated into a host genome comprising detecting the connectrons in 
said host genome. 



25 



An object of the invention is to provide a method of detecting the expression effect of 
different gene collections in a given body comprising detecting the back and forth 
flow of connectrons between the chromosomes thereof. 
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An object of the invention is to provide a method of modifying a given body 
comprising modifying the connectron organization therein. 

An object of the invention is to provide a method of detecting connectron control and 
5 target sequences in a given genome comprising: 

determining the base composition of said genome, 

determining one or more sites of control sequence organization, and/or 

determining one or more sites of target application. 

10 

An object of the invention is to provide a method of determining the response of a cell 
in any tissue to changes in the cell's environment and/or genetic composition 
comprising providing a complete genomic DNA sequence for the organism and 
determining the effect of changes in connectrons due to application of a given 
15 exogenous stimulus to the gnome. 

An object of the invention is to provide a method of determining in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes, the tetradic relationship 
T1=C1 and T2=C2 where Tl and T2 are DNA sequences 20 or more bases in length, 
20 where the CI sequence is adjacent to the C2 sequence, where the Tl and T2 

sequences are on the same chromosome, and where the C1/C2 sequences are on the 
same chromosome as Tl and T2 or where the C1/C2 sequences are on a chromosome 
different from Tl and T2, wherein: 

25 CI sequence - any positive or negative strand DNA sequence of 20 bases or 

more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
30 more, the CI sequence must occur in the same chromosome as the C2 

sequence, 
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„ C1/C2 ~ any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
5 more that is on the same chromosome as the T2 sequence, the Tl and T2 

sequences must be between about lkb and 105kb apart, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
10 sequences must be between about lkb and 105kb apart. 

An object of the invention is to provide a method of determining in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes, the connectron 
relationship that permits many different C1/C2 short loops to control the existence of 
15 a T1-T2 long loop and wherein said C1/C2 short lops can be on the same 

chromosome or on different chromosomes from the T1-T2 long loop, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
20 sequence, 



C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

25 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
30 more that is on the same chromosome as the T2 sequence, the Tl and T2 

sequences must be between about lkb and 105kb apart, and 
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T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

An object of the invention is to provide a method of determining in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes, the connectron 
relationship that permits one C1/C2 short loop to control the existence of many T1-T2 
long loops, the C1/C2 short loop can be on the same chromosome or on different 
chromosomes from the T1-T2 long loops, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 



15 C2 sequence - any positive or negative strand DNA sequence of 20 bases or 

more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
20 such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 

25 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

30 An object of the invention is to provide a method of determining in the connectron 

relationships between prokaryotes and their plasmids wherein said connectrons 
implement a control mechanism between the two genomes that makes it possible from 
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them to form a symbiotic relationship, and in the case of D. radiodurans the 
relationship is not symmetric, and the D. radiodurans genome sends C1/C2 short 
loops to the MP1 plasm id, wherein: 

5 CI sequence - any positive or negative strand DNA sequence of 20 bases or 

more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
10 more, the CI sequence must occur in the same chromosome as the C2 

sequence, 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

15 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 

20 T2 sequence - any positive or negative strand DNA sequence of 20 bases or 

more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

An object of the invention is to provide a method of determining that connectron 
25 relationships that exist in plant and higher animals. 

An object of the invention is to provide a method of determining in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes, the connectron 
relationship that permits one C1/C2 short loop to control the existence of one or more 
30 T1-T2 long loops without being subject to any expression controls other than those of 

the gene to which the C1/C2 is 3TJTR, wherein: 
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CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 540 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart, and 

3TJTR - untranslated 3' end of an mRNA is beyond the end of the last exon, a 
stop codon in the mRNA causes the ribosome to stop the translation of mRNA 
into protein. 

An object of the invention is to provide a method of determining in prokaryotes, 
25 archea, single-celled eukaryotes and multi-celled eukaryotes, the connectron 

relationship that permits one C1/C2 short loop to control the existence of one or more 
T1-T2 long loops such that this C1/C2 short loop is itself subject to expression control 
by another T1-T2 long loop which surrounds it, wherein: 

30 CI sequence - any positive or negative strand DNA sequence of 20 bases or 

more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 
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C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

5 

C1/C2 - any positive or negative strand DNA sequence of 540 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
10 more that is on the same chromosome as the T2 sequence, the Tl and T2 

sequences must be between about lkb and 105kb apart, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
1 5 sequences must be between about lkb and 105kb apart. 

An object of the invention is to provide a method of determining in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes, the connectron 
relationship that permits one C1/C2 short loop to control the existence of the T1-T2 
20 long loop that surrounds it, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

25 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

30 C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 

such that the CI sequence is adjacent to the C2 sequence, 
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Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about Ikb and 105kb apart, and 

5 T2 sequence - any positive or negative strand DNA sequence of 20 bases or 

more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

An object of the invention is to provide a method of determining the connectron 
10 relationships that do not have any genes within the T1-T2 long loop, wherein: 

Tl sequence is any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, and 

15 T2 sequence - any positive or negative strand DNA sequence of 20 bases or 

more that is on the same chromosome as the Tl sequence, and the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

An object of the invention is to provide a method of determining the geneless 
20 connectron relationship where one C1/C2 short loop controls the existence of many 

geneless T1-T2 long loops, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
25 sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 



30 



C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 
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Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about Ikb and 105kb apart, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about Ikb and 105kb apart. 
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Description of the Drawings and Tables 

The above and other objects, advantages and features of the invention will become 
more apparent when considered with the following specification and accompanying 
drawings and tables wherein: 

Figure 1 DNA is structured in six levels of increasing condensation. Double 
stranded DNA is level 1. Two turns of DNA are wrapped about each 
chromatin particle at level 2. The chromatin particles which each 
containing 200 base pairs form into 30 nm particles at level 3. The 30 
nm particles form into large loops with an approximate dimension of 
300 nm at level 4. Metaphase chromosomes form a condensed 
structure with an approximate dimension of 700 nm at level 5. An 
entire metaphase chromosome has a width of approximately 1400 nm 
at level 6. The large loops at level 4 of the DNA structuring are 
thought to have between 20,000 (20 kb) and 100,000 (100 kb) base 
pairs. 

The Molecular Biology of the Cell by Alberts, Bray, Lewis, Raff, 
Roberts and Watson, 3rd. ed. , Garland Publishing, Inc., New York, 
1994, p. 354 

Figure 2 (a) Chromatin DNA forms into a six-fold symmetry 30nm particles. 

(b) The six-fold symmetry 30nm particles form a linear chain with a 
varying number of repeat units. 

The Molecular Biology of the Cell by Alberts, Bray, Lewis, Raff, 
Roberts and Watson , 3rd. ed. , Garland Publishing, Inc., New York, 
1994, p. 345 
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Figure 3 Long loops of 30nm particles are thought to be closed at the bottom of 
the loop by proteins. 



10 



The Molecular Biology of the Cell by Alberts, Bray, Lewis, Raff, 
Roberts and Watson, 3rd. ed. 9 Garland Publishing, Inc., New York, 
1994, p. 348 

Figure 4 (a) Transcription and Editing, (b) Movement of the RNA through the 
Nucleus, (c) Connectron Formation 

Figure 5 Overview of schematic organization of a typical transcriptionally 
active chromosomal loop. 

From http://genomatix.gsf.de/func_genomics/ 
functional_genomics.html 



15 



Table 1 Connectron Properties for Prokaryotic, Archea and Eukaryotic 

Genomes 



Table 2 



Yeast Inter-Chromosomal Connectron Distribution 
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Figure 6 Genome size plotted as a log-log function of the Number of 
Connectrons 

Figure 7 Number of Sequence Instances plotted as a function of the Number of 
Fragments 

5 Figure 8 Level 0 - The overall view of the algorithm 

Figure 9 Level 1 - Process Flow of the Algorithm 

Figure 10 Level 2a - two pages - Process Genome into Blocking Fragment File 
Figure 1 1 Level 2b - two pages - Compute the Connectrons for a Genome 
Figure 12 Level 2c - two pages - Analyze Possible Connectrons 
1 0 Figure 1 3 Level 3a - Setup Genome Usage Memory 

Figure 14 Level 3b - Find DBP-Size Blocking File for Tl -Window 
Figure 15 Level 1 - Find DBP-Size Blocking File for T2- Window 
Figure 1 6 Level 2a - two pages - Find C 1 /C2 Entries 

Figure 17 Level 2b - two pages - Scan Genome Usage Memory for Potential 
15 Connectrons 



i 
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Description of the Invention 

A connectron is a relationship among four DNA sequences. Each sequence must be 
at least 20 bases long. There is a report by Sharp and Zamore [3] that RNA sequences 
5 of "about length 25" are important as sources of RNAi. 27 bases were actually used 

as the minimum length of each of the sequences. The Tl sequence is on one strand of 
some chromosome in a genome. The T2 sequence is on the same strand of the same 
chromosome as the Tl sequence. The Tl and T2 sequences (which are each at least 
20 bases in length) must be at least 5,000 bases distant from each other but they can 

10 not be more than 105,000 bases distant from each other. The CI sequence and the C2 

sequence (which are each at least 20 bases in length) are adjacent to each other on 
some strand of some chromosome in the genome. The C1/C2 sequences - called the 
"short loop" - can be on the same strand as the Tl and T2 sequences or they can be 
on the opposite strand. The C1/C2 sequences of the short loop can be on the same 

15 chromosome as the Tl and T2 sequences but they can also be on a different 

chromosome in the genome. When a genome has only one chromosome, then the 
point is moot. Many genomes, of course, have several chromosomes. The CI 
sequence is identical to the Tl sequence and the C2 sequence is identical to the T2 
sequence. 

20 

The C1/C2 sequence must be on the same strand as a gene, either be directly adjacent 
to the gene (i.e. a gap of 0 bases) for prokaryotic genomes or at this time be within 
10,000 bases for eukaryotic genomes. The size of the gap between the end of the 
gene and the beginning of the C1/C2 sequence is a variable. The C1/C2 short loop is 

25 expressed as the 3'UTR (Un-Translated Region) of the gene. In the case of 

prokaryotic genes that do not normally have introns, the whole mRNA becomes the 
active species for connectron formation. In the case of eukaryotic genes, the whole 
transcript is the active species for connectron formation upon editing of the transcript 
to eliminate the introns. The whole transcript then can move about in the cytoplasm 

30 of prokaryotic cells or the nucleus of eukaryotic cells. Since the CI sequence is 

equivalent to the Tl sequence and the C2 sequence is equivalent to the T2 sequence, 
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the CI RNA can form a Hoogsteen triple-stranded RNA/DNA/DNA helix with the 
double-stranded Tl sequence. Similarly the C2 RNA can form a Hoogsteen triple- 
stranded RNA/DNA/DNA helix with the double-stranded T2 sequence. Because the 
CI sequence and the C2 sequence are adjacent to each other, the C1/T2 
RNA/DNA/DNA Hoogsteen triple helix is brought into physical adjacency to the 
C2/T2 RNA/DNA/DNA Hoogsteen triple helix. RNA/DNA/DNA hybrid helices are 
the most stable form of triple helix. RNA double helices, DNA double helices, RNA 
triple helices and DNA triple helices are all significantly less stable than a 
RNA/double-stranded DNA triple helix. The stable physical adjacency of the two 
triple-stranded Hoogsteen helices ensures that the long loop of double-stranded DNA 
between the Tl sequence and the T2 sequence can then be structured into 30 nm 
chromatin particles as shown in level 4 of figure 1. The genes on either strand of the 
DNA between the Tl sequence and the T2 sequence when they are structured into the 
30 nm chromatin particles are not open to promotion and expression. 

The tetradic relationship between the Tl and T2 sequences that form the long loop 
and the C1/C2 sequences that form the short loop are called a connectron. The name 
"connectron" was suggested by J. David Rawn Ph.D. of Towson University. A 
connectron is possible if the Tl, T2, CI and C2 sequences exist. A connectron is real 
if the C1/C2 short loop sequence is adjacent to an expressible gene. If the expression 
of the adjacent gene is inside one or more Tl - T2 long loops then this connectron is 
said to be transient. If the adjacent gene is not inside any possible T1-T2 long loop 
then the connectron is said to be permanent. If a connectron is inside of a T1-T2 long 
loop that has the same sequences (i.e. Tl is really equal to CI and T2 is really equal 
to C2) then the connectron is said to be self-limiting. This is true because once the 
C1/C2 sequence is expressed it forms the T1-T2 long loop that then shuts off the 
expression of the gene adjacent to the C1/C2 sequence. Self-limiting conectrons can 
also be called "spike" connectrons since they generate a short-duration spike of the 
C1/C2 short loop sequence. If a T1-T2 long loop does not contain any genes but it 
contains C1/C2 short loop sequences then this type of connectrons is said to be 
geneless. The C1/C2 short loops within a geneless T1-T2 long loop can, of course, 
control the expression of genes. 
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The physical existence and lifetimes of the connections must be proved by molecular 
biological experimentation. This physical experimental process, however, is logically 
quite separate from the computational experimentation that have been conducted 
5 from June of 1999 to May of 2001. The computational search for the existence of 

connections has been extremely positive. These computations have shown that 
connections exist in prokaryotes, in archea, between prokaryotes and their plasmids, 
in single-celled eukaryotes, in multi-celled eukaryotes, in plants, in higher animals 
and in humans. All of these features and properties are described in the claims 
10 section that fo Hows . 

The connection invention is very powerful. It depends only on sequence equivalency. 
The minimum length of the four sequences seems to be about 20 bases. In the 
calculations shown in this patent application, 27 bases have been used as a minimum. 

15 The Nature News Feature [1] says that other scientists have found RNA sequences of 

length about 25 that have interesting gene silencing properties. The Nature article 
does not give any mechanism. Because of my algorithm and its use on a variety of 
genomes, this patent application provides the computational proof that a particular 
mechanism is highly probable. The connection invention provides an explanation for 

20 how communication occurs with a chromosome as well as between chromosomes in 

genomes that have more than one chromosome. Since each T1-T2 long loop can 
contain one or more genes, the connectron invention provides a mechanism for 
turning on and turning off sets of genes simultaneously. In time, the connectron 
invention will provide an explanation for how differentiation of how one cell's 

25 behavior differs from the behavior of another adjacent cell. It is already clear from 

the computational experiments that have been made on S. cervesiae, C. elegans and 
D. megalom aster that the number of geneless connectrons increases dramatically as 
evolution proceeds from single-celled eukaryotes (i.e. S. cervesiae) to 1,000 ceil 
eukaryotes (i.e. C. elegans) to visible creatures (i.e. D. megalomaster). The extension 

30 of this evolutionary progress to plants (i.e. A. thaliania) for which only three 

chromosomes are sequenced and humans (i.e. H. sapiens) for which only one 
chromosome is completely sequenced. Although the complete human genome was 
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published in Nature and Science in February of 2001, the NIH-sponsored genomic 
sequencing results are available for about 1/3 of the bases in the whole genome. The 
human genomic sequence determined by Celera Genomics, Inc. is available only by 
subscription. Table 1 shows how the genome size, the number of genes, the number of 
5 gene-containing and geneless connectrons and the percentage of genes controlled are 

related in many different genomes. 

The C1/C2 short loops originate on one chromosome. The T1-T2 long loops can be 
on the same or different chromosomes. Table 2 which is for yeast (S. cervesiae) is a 
10 square matrix of how many C1/C2 short loops on a given chromosome are sent to 

form T1-T2 long loops on other chromosomes. The diagonal of this matrix shows 
that many chromosomes send connectrons to themselves. The striking feature of this 
particular table is that chromosome 6 only sends connectrons to chromosome 12 but 
that it receives connectrons from chromosomes 4,5,7,10,12,13,15 and 16. 

15 

Any tetrad of connectron sequences (i.e. the Tl, T2, CI and C2 sequences) as well as 
the fact of the adjacency of the C1/C2 short loop sequence to the transcribing gene 
can be patented because the content of matter and the utility can be exactly described. 
The utility of a connectron is that the T1-T2 long loop shuts off the expression of the 
20 genes that lie between the Tl sequence and the T2 sequence. In the case of geneless 

connectrons, the utility is of a higher level in that the C1/C2 short loops contained in 
the higher-level geneless T1-T2 long loop, eventually form other lower-level T1-T2 
long loops around a set of genes. 

25 The invention of connectrons comes at a particularly important time in biological 

discovery. The geneless connectrons make a many-to-many hierarchical control 
mechanism possible. It is already clear from the determination of the conectrons for 
C. elegans and D. megalomaster that there are as many or more geneless connectrons 
than there are genes. It has been clear for some time that the number of genes in a 

30 genome is not particularly correlated with the size of the genome. Figure 6 shows 

that the size of a genome is roughly linearly correlated with the number of 
connectrons. 
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The connectron invention can be used to generate a model of behavior in any cell. 
The simulation of connectron behavior in different genomes will be the subject of 
another patent application. 

5 

The connectron invention provides for a rational exploitation of the information 
contained in the raw genomic DNA sequence by forming a hierarchy of relationships 
between geneless connectrons, transient connections, permanent connectrons, self- 
limiting connectrons and the expression of genes. 

10 



-30- 



BJMSDOCfD: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



Detailed Description of the Invention 

The algorithm for the determination of connectrons in any genome or any genome 
fragment is represented in the following flow diagrams. The Level 0 diagram in 
figure 8 shows the general relationships in a digital computer. The central processor 
of the digital computer uses the computer program to take genome descriptors, the 
genomic DNA sequences and the tables of gene features to produce a file of blocking 
fragments and a file of the optimal connectrons for the genome. The printer serves to 
make hard copies of the files and this patent application. The level 1 diagram in 
figure 9 shows the three essential steps in the determination of connectrons. The 
genome is first processed into a blocking fragment file. Then the blocking fragments 
are used to compute the connectrons for the genome. Finally the potential 
connectrons are analyzed to determine if the C1/C2 sequences are in the 3'UTR of a 
gene. The level 2a diagram in figure 10 shows the steps required for the processing of 
the genome into a file of blocking fragments. The genomic DNA sequence is 
decomposed into 27-base frames for both the positive and negative strands. These 
fragments are written to the unsorted fragment file. The fragment file is then sorted is 
then read and formed into groups of equivalent sequences. The (.blk) file contains the 
sequence and a pointer to the (.gptr) file which contains the pointers to the position of 
the fragments in the genomes. The position in the genome includes the chromosome 
number, the position in the chromosome and the strand (i.e. positive and negative). A 
sample of these files follows 



Sample of the (.blk) file for S. cervesiae 

27-base fragment Number Pointer 

of instances to (.gptr) file 



1111 11111111111111111111111 


0 


1 


111111 123244233313332443414 


1 


2 


111111141113443133314333341 


2 


4 


11111 1232442333133324434141 


1 


5 


111111323311133323144423444 


2 


7 


11111 1332213331341414443413 


2 


9 


11111 13334441 12343412323243 


1 


10 
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1111 1 13334441 13343412323243 


9 


19 


1 11 11141 11 3443 1333 143333414 


2 


21 


1 1 1 1 1 1 443223 1 34 142 1 24434 1 24 


2 


23 


1111 12223234344444443 144442 


2 


• 25 


1111 12244 123441 1222 14421213 


o 
O 


33 


111 11011 10/1111 41/1/111/4 1 1/1/11 1 
1111 123 1 1241 1 1434433413443 1 


2 


1< 

3 J 


11111 232442333 1 333244341 41 4 






1111 1234423223 1344242234342 




37 


1 1 1 112433444244421 14413421 1 




38 


1111 1244431 13 13442332142224 




39 


111113131241131114424413231 




40 


1111 13 14333234431 1 1 1313341 1 




41 


111113233111333231444234441 


2 


43, 



15 In fragments above 1=G> 2=C, 3=A 5 4=T 

Sample of the (.gptr) file for S. cervesiae 
There are 16 chromosomes in S, cervesiae 

20 

Item Chromosome Position Direction 
in Chromosome 



1 


0 


0 


0 


2 


4 


11137 


1 


3 


12 


467619 


1 


4 


12 


458482 


1 


5 


4 


11138 


1 


6 


12 


465759 


2 


7 


12 


456622 


1 


8 


1 


219366 


1 


9 


8 


539978 


1 


10 


14 


522451 


1 


11 


4 


1099073 


1 


12 


4 


1210003 


1 


13 


7 


539068 


1 


14 


12 


654136 


1 


15 


12 


596455 


1 


16 


15 


121016 


1 


17 


15 


598127 


2 


18 


16 


847724 


1 


19 


16 


59765 


1 


20 


12 


467620 


1 


21 


12 


458483 


1 


22 


12 


461657 


1 


23 


12 


452520 


1 


24 


13 


838006 


1 
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25 


15 


288270 


1 


26 


4 


83593 


1 


27 


4 


992867 


1 


28 


6 


162265 


1 


29 


7 


845687 


1 


30 


10 


531560 


2 


31 


15 


282208 


1 


32 


16 


860418 


1 


33 


16 


572308 


1 


34 


12 


465992 


1 


35 


12 


456855 


1 


36 


4 


11139 


1 


37 


8 


89343 


1 


38 


4 


10302 


1 


39 


1 


19894 


2 


40 


16 


9311 


1 


41 


10 


735203 


1 


42 


12 


465760 


1 


43 


12 


456623 


1 



20 

In direction column above l=positive strand, 2=negative strand 

The level 2b diagram in figure 1 1 shows the computation of the connectrons. The 
25 genome descriptors consist of the number and length of the chromosomes. The 

algorithm uses an array that represents several facts about each base position in the 
genome. The level 3a diagram in figure 13 shows the setup of the Genome-Usage 
memory. The gene features are used to prevent the region of the genome that codes 
for proteins from being used for the connectron sequences (i.e. the Tls, the T2s, the 
30 Cls and the C2s). In the level 2a diagram of figure 10, the algorithm steps through 

each chromosome and within each chromosome through each base position looking 
for acceptable Tl-windows of 27 bases. A Tl-window can be used to form a 
connectron relationship if there are two or more instances of this fragment in the 
blocking fragment file. The computation in the level 3b diagram of figure 14 
35 determines if the Tl-window is acceptable of not. Once an acceptable Tl-window is 

found, the algorithm (in the level 2a diagram of figure 10) looks for acceptable T2- 
window positions that lie between 5,000 and 105,000 bases from the Tl-window. 
The computation for determining acceptable T2-window positions is done in the level 
3c diagram of figure 15. Once a pair of Tl and T2 window positions are found, the 
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algorithm looks among the instances of these Tl and T2 sequences for a pair of 
sequences CI and C2 that lie within 200 bases of each other on the same 
chromosome. The computation for determining acceptable C1/C2 windows is shown 
in the level 3d diagram in figure 16. In the level 3e diagram of figure 17 the Genome- 
5 Usage memory is scanned for the Possible-Connectrons. In the level 2c diagram of 

figure 12 the Possible-Connectrons are scanned to determine if the C1/C2 sequences 
are within the Gap-Distance of a gene on either the positive or the negative strands. 
The Real-Connectrons are then written out in several different files including the 
descriptions in the claims section. 

10 
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Examples 

The algorithm for the determination of optimal connectrons has been applied to a 
number of different publicly available genomes. The connectron is a tetradic 
relationship between four sequence elements - Tl, T2, CI and C2. The claims 
5 presented in this section are written by the program NearGene that implements the 

flow diagram Level 2c of figure 12. The examples are written a uniform type of 
English. Each example contains some or all of the following elements 



1 0 Name of genome 

Description of Tl 
Length of T1-T2 loop 

The chromosome on which the T1-T2 loop exists 
The identifier number within the genome of the Tl sequence 
15 TheTl sequence 

Description of T2 

The identifier number within the genome of the T2 sequence 
The T2 sequence 

A list of genes whose expression is controlled by the T1-T2 loop 
20 The common names of the genes as obtained from the NCBI gene feature file 

(.ptt) 

A list of C1/C2 short loops whose expression if controlled by the T1-T2 loop 
The chromosome on which the C1/C2 short loop exists 

The common name of the gene which expresses the C1/C2 short loop as an 
25 RNA 

The sequence of the C1/C2 short loop 

A list of C1/C2 short loops that control the formation of the T1-T2 loop 
The chromosome on which the C1/C2 short loop exists 

The common name of the gene which expresses the C1/C2 short loop as an 
30 RNA 

The sequence of the C1/C2 short loop 



-35- 



BNSDOCID: <WO 0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



The match between the C1/C2 sequence and the Tl sequence 
The match between the C1/C2 sequence and the T2 sequence 

The uniform descriptions make it possible to rapidly comprehend the specifics in each 
example. 

When a sequence element is very long a series of four dots has been inserted between 
the beginning and ending sequence groups. A variable number of bases have been 
deleted. 



10 
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Index of Pages for Connectron Samples 
Page 39 

1 . Connectrons occur in prokaryotes, archea, single-ceiled eukaryotes and multi- 
celled eukaryotes. 

Page 57 

2. Many Connectrons control the expression of one set of genes in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes. 

Page 83 

3. One connectron controls the expression of many sets of genes in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes. 



15 Page 107 

4. Connectrons occur between prokaryotes and their plasmids. 

Page 117 

5. Connectrons occur in plants and higher animals 

20 

Page 126 

6. Permanent connectrons exist in prokaryotes, archea, 
single-celled eukaryotes and multi-celled eukaryotes. 

25 Page 135 

7. Transient connectrons exist in prokaryotes, archea, 
single-ceiled eukaryotes and multi-celled eukaryotes. 

Page 152 

30 8. Self-limiting connectrons occur in prokaryotes, archea, single-celled 

eukaryotes and multi-celled eukaryotes 
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Page 164 

9. Geneless connectrons exist in single-celled and 
multi-celled eukaryotes 

Page 174 

10. One connectron controls many geneless connectrons 
in single-celled and multi-celled eukaryotes 
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1. Connectrons occur in prokaryotes, archea, single-celled eukaryotes and multi- 
celled eukaryotes. 

Connectrons exist as tetradic relationships where the sequence Tl is equivalent to the 
5 sequence CI (written T1=C1) and where the sequence T2 equals the sequence C2 

(written T2=C2) where Tl and T2 are DNA sequences 20 or more bases in length, 
where the CI sequence is adjacent to the C2 sequence, where the Tl and T2 
sequences are on the same chromosome, and where the C1/C2 sequences are on the 
same chromosome as Tl and T2 or where the C1/C2 sequences are on a chromosome 
10 different from Tl and T2. The connectron relationship has been found to exist in 

prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes. 

Example of a prokaryote connectron - E. coli 

15 In this example the existence of the T1-T2 (3197-3308) long loop is controlled by 

three C1/C2 short loops (3307, 3432 and 2218). The T1-T2 long loop controls the 
expression of 64 genes on chromosome 1 in addition to six C1/C2 (3204, 3206, 3223, 
3228, 3301 and 3327) short loops. The C1/C2 short loop 3327 lies outside the range 
of the T1-T2 long loop (3197-3308) but this C1/C2 is expressed as a 3'UTR to the 

20 gene hemG that is within the range of the T1-T2 long loop. 



3307 Chromosome 1 
25 3432 Chromosome 1 

2218 Chromosome 1 
I 

* * * 

| Chromosome 1 | 

30 3197 3308 

| 3204 3206 | 

| 3224 3228 | 

I 3301 3327 I 
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Connectron control elements for chromosome 1 of the E. coli genome 

A double stranded DNA loop of length 93.542 kilo-bases on chromosome 1 is 
5 bounded on the left by a Tl sequence whose identifier is 3197. This Tl control 

element has the DNA sequence 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGG 
AATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACG 
1 0 CCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAA 
ATAAATGCTTGACTCTGTAGCGGGAA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 3308. This T2 control element has the DNA sequence 

15 

TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTG 
ACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAG 
AACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAG 
GCGTATTATGCACACCCCGCGCCGCT 

20 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 





rrsC 


gltU 


rrlC 


rrfC aspT trpT 


yifA 


yifE 


yifB 


25 


ilvL 


iivG_l 


ilvM 


ilvE ilvD ilvA 


ilvY 


ilvC 


ppiC 




b3776 


rep 


gPPA 


rhlB trxA rhoL 


rho 


rfe 


wzzE 




wecB 


rffH 


wecD 


wecE wzxE 


yifM_2 


wecG 


yifK 




argX 


hisR 


leuT 


proM aslB aslA 


hemY 


hemX 




hemD 


cyaA 


cyaY 


b3808 dapF 


uvrD 


b3814 


corA 


30 


yigF 


yigG 


rarD 


yigl pldA recQ 


yigJ 


yigK 


pldB 




yigL 


yigM 


metR 


metE ysgA udp 


yigN 


ubiE 


yigP 
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b3836 yigU yigW_l rfaH yigC ubiB fadA fadB 
pepQ trkH hemG 

This long T1/T2 double stranded DNA loop modulates the expression of the 
5 following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 3204 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 f UTR to the gene rrsC and has the 
10 DNA sequence 

GATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAACGGCTCACCTAGG 
CGACGATCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAG 
ACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGCACAATG 
1 5 GGCGC AAGCCTGATGC AGCCATGCCGCGTGTATGAA 



A C1/C2 short loop on chromosome 1 whose identifier is 3206 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 , UTR to the gene rrsC and has the 
20 DNA sequence 



GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAG 
GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTC 
ACCTGCCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTG 
25 CTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAA 



A C1/C2 short loop on chromosome 1 whose identifier is 3223 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 f UTR to the gene rrlC and has the 
30 DNA sequence 
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GCTGAAGTAGGTCCCAAGGGTATGGCTGTTCGCCATTTAAAGTGGTACGC 
GAGCTGGGTTTAGAACGTCGTGAGACAGTTCGGTCCCTATCTGCCGTGGG 
CGCTGGAGAACTGAGGGGGGCTGCTCCTAGTACGAGAGGACCGGAGTGG 
ACGCATCACTGGTGTTCGGGTTGTCATGCCAATGGCA 

A C1/C2 short loop on chromosome 1 whose identifier is 3225 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 r UTR to the gene rrlC and has the 
DNA sequence 

AAACAGAATTTGCCTGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCAT 
GCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTC 
CCCATGCGAGAGTAGGGAACTGCCAGGCATCAAATTAAGCAGTA 

A C1/C2 short loop on chromosome 1 whose identifier is 3228 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene rrfC and has the 
DNA sequence 

GGTCATAAAACCGGTGGTTGTAAAAGAATTCGGTGGAGCGGTAGTTCAGT 
CGGTTAGAATACCTGCCTGTCACGCAGGGGGTCGCGGGTTCGAGTCCCGT 
CCGTTCCGCCAC 

A C1/C2 short loop on chromosome 1 whose identifier is 3301 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene ubiB and has the 
DNA sequence 

TTATCGTGCCTACAAATAGTCCGAACCGTAGGCCGGATAAGGCGTTTACG 
CCGCATC 
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A C1/C2 short loop on chromosome 1 whose identifier is 3307 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene fadA and has the 
DNA sequence 

TGCCGGATGCGGCGTAAACGCCTTATCCGGCCTACGGTTCGGACTATTTGT 
AGGCA 



A C1/C2 short loop on chromosome 1 whose identifier is 3327 controls the 
10 expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 

loop is expressed as a RNA single strand that is 3 ! UTR to the gene hemG and has the 
DNA sequence 



AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGG 
1 5 AATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACG 
CCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAA 
ATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATG...CCCGTCACACCA 
TGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCT 
TACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTA 
20 GGGGAACCTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGTTCTTTG 



The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

25 A C1/C2 short loop on chromosome 1 whose identifier is 3307 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3 ! UTR to the gene hemG and has the DNA sequence 

AAAAAATGCGCGGTCAGAAAATTATTTTAAA 
30 AATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACG 
CCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAA 
ATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATG...CCCGTCACACCA 
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TGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCT 

TACCACTTTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTA 

GGGGAACCTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGTTCTTTG 

5 The match between the Tl sequence and the C1/C2 sequence is 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGG 
AATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACG 
CCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAA 
10 ATAAATGCTTGACTCTGTAGCGGGAA 

The match between the T2 sequence and the C1/C2 sequence is 

TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTG 
1 5 ACACGGAACAACGGC AAAC ACGCCGCCGGGTCAGCGGGGTTCTCCTGAG 

AACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAG 
GCGTATTATGCACACCCCGCGCCGCT 

A C1/C2 short loop on chromosome 1 whose identifier is 3432 controls the 
20 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3*UTR to the gene btuB and has the DNA sequence 

TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGG 
25 GTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATG 
CTTGACTCTGTAGCGGGAAGGCGTATTATGCACACC..ACACCATGGGAGT 
GGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACT 
TTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAAC 
CTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGT 

30 

The match between the Tl sequence and the C1/C2 sequence is 
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TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGG 
GTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATG 
CTTGACTCTGTAGCGGGAA 

5 

The match between the T2 sequence and the C1/C2 sequence is 

TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTG 
ACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAG 
1 0 AACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAG 
GCGTATTATGCACACCCCGCGCCGCT 

A C1/C2 short loop on chromosome 1 whose identifier is 2218 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
15 as a RNA single strand that is 3'UTR to the gene clpB and has the DNA sequence 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAAC 
AACGGCAAACACGCCGCCGGGC 

20 The match between the Tl sequence and the C1/C2 sequence is 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAAC 
AACGGCAAACACGCCGCCGGGC 

25 The match between the T2 sequence and the C1/C2 sequence is 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAAC 
AACGGCAAACACGCCGCCGGGTC 



Example of an archea connection — H. pylori 
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In this example the existence of the T1-T2 (812-882) long loop is controlled by three 
C1/C2 short loops (881, 813 and 1214). The T1-T2 long loop controls the expression 
of 54 genes on chromosome 1 in addition to one C1/C2 (843) short loop. 

881 Chromosome 1 
813 Chromosome 1 
1241 Chromosome 1 

i 

* * * 

| Chromosome 1 | 

812 882 



Connectron control elements for chromosome 1 of H. pylori genome 

A double stranded DNA loop of length 96.385 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 812. This Tl control 
element has the DNA sequence 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 882. This T2 control element has the DNA sequence 

TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

HP0999 HPIO00 HP1001 HP1002 HP1003 HP1005 HP1006 

HP1G08 HP1009 HPtRNA-Pro HP1010 HP1011 HP1013 HP1015 

HP1017 HP1018 HP1020 HP1021 HP1022 HP1023 HP1024 
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HP1067 
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HP1074 
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HP1076 


HP1077 


HP 1078 


HP 1079 
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HP1081 


HP 1083 


HP1084 


HP1085 


HP1088 


HP1091 


HP 1092 


HP1093 


HP1094 


HP1095 HP1096 









This long T1/T2 double stranded DNA loop modulates the expression of the 
10 following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 813 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene HP0998 and has the DNA 
15 sequence 

TTTTACTCATAGGGTTTT^ 
AAACACTAAAGATATTTGG 

20 The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 

short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 881 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
25 expressed as a RNA single strand that is 3'UTR to the gene HP 1096 and has the DNA 

sequence 

TTTTACTCATAGGGTTTTT^ 
AAACACTAAAGATATTTGG 

30 

The match between the Tl sequence and the C1/C2 sequence is 
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TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 
The match between the T2 sequence and the C1/C2 sequence is 
5 TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

10 A C1/C2 short loop on chromosome 1 whose identifier is 813 controls the expression 

of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene HP0998 and has the DNA sequence 

TTTTACTCATAGGGTTTTT^^ 
1 5 AAACACTAAAGATATTTGG 

A C1/C2 short loop on chromosome 1 whose identifier is 881 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3 , UTR to the gene HP1096 and has the DNA sequence 

20 

TTTTACTCATAGGGTTTTTA 
AAACACTAAAGATATTTGG 

The match between the Tl sequence and the C1/C2 sequence is 

25 

TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 
The match between the T2 sequence and the C1/C2 sequence is 
30 TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 
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A C1/C2 short loop on chromosome 1 whose identifier is 1241 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene HP1535 and has the DNA sequence 

5 TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGC 
AAACA 

The match between the Tl sequence and the C1/C2 sequence is 
1 0 TTTTACTC ATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

The match between the T2 sequence and the C1/C2 sequence is 
TAGCGGAACTAAAGCATTCATCCCAAACA 

15 



Example of single-celled connectron - S. cervesiae 



20 In this example the existence of the T1-T2 (1352-1416) long loop on chromosome 4 

is controlled by one C1/C2 short loop (4213) on chromosome 10. The T1-T2 long 
loop controls the expression of 34 genes on chromosome 4 in addition to one C1/C2 
(1356) short loop. 

25 4213 Chromosome 10 



| Chromosome 4 | 
1352 1416 
30 I 1356 I 



Connectron control elements for chromosome 1 of S. cervesiae genome 
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A double stranded DNA loop of length 68.908 kilo-bases on chromosome 4 is 
bounded on the left by a Tl sequence whose identifier is 1352. This Tl control 
element has the DNA sequence 

5 

TTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1416. This T2 control element has the DNA sequence 

10 

ATTAGATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCAACTATCA 
TCTACTAACTAGTATTTACGTTACTAGTATATTATCATATACGGTGTTAGA 
AGATGACGCAAATGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAA 
ACGCAAGGATTGATAATGTAATAGGATCAATGAATATTAACATATAAAAC 
15 GATGATAATAATATTTATAGAATTGTGTAGAATTGCAGATTCCCTTTTATG 
GATTCCTAAATCCTTGAGGAGAACTTCTAGTATATCTACATACCTAATATT 
ATAGCCTTAATCACAATGGAATCCCAACAATTACATCAAAATCCACATTC 
TCTACAGTA 

20 This long T1/T2 double stranded DNA loop modulates the expression of the 

following genes 

YDR170W-A YDR171W YDR172W YDR173C YDR174W YDR175C 
YDR176W YDR177W YDR178W YDR179C YDR179W-A YDR180W 
25 YDR181C YDR182W YDR183W YDR184C YDR185C YDR186C 

YDR187C YDR188W YDR189W YDR190C YDR191W YDR192C 
YDR193W YDR194C YDR195W YDR196C YDR197W YDR198C 
YDR199W YDR200C YDR201W YDR202C YDR203W YDR204W 
YDR205W YDR206W YDR207C YDR208W YDR209C YDR210W 

30 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 
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A C1/C2 short loop on chromosome 4 whose identifier is 1356 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene YDR1 70 W- A and 
5 has the DNA sequence 

AATCACACTAATCATTGTGATGATGAACTCCCTGGACACCTCCTTCTCGAT 
TCAGGAGCATCACGAACCCTTATAAGATCTGCTCATCACATACACTCAGC 
ATCATCTAATCCTGACATAAACGTAGTTGATGCTCAAAAAAGAAATATAC 
10 CAATTAACGCTATTGGTGACCTACAATTTCACTTCCAGGACAACACCAAA 
ACATCAATAAAGGTATTGCACACTCCTAACATAGCCTATGACTTACTCAGT 
TTGAATGAATTGGCTGCAGTAGATATCACAGCATGCTTTACCAAAAACGT 
CTTAGAACG 

15 The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 

short loops. 

A C1/C2 short loop on chromosome 10 whose identifier is 4213 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
20 as a RNA single strand that is 3'UTR to the gene YJR029W and has the DNA 

sequence 

ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTAT 
CAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGAT 
25 GACATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAACGC 
AAGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACGGA 
ATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGA 
GGATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAATA 
TTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACAT 

30 

The match between the Tl sequence and the C1/C2 sequence is 
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10 



30 



35 



TTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAA 
The match between the T2 sequence and the C1/C2 sequence is 
ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATC 



Example of a multi-celled connection - C. elegans 



In this example the existence of the T1-T2 (9-138) long loop on chromosome 1 is 
controlled by three C1/C2 short loops on chromosome 5 (21719, 21949 and 21655). 
15 The T1-T2 long loop controls the expression of four genes on chromosome 1 in 

addition to seven C1/C2 (119, 122, 125, 130, 132, 134 and 136) short loops. 



21719 Chromosome 5 
21949 Chromosome 5 
20 21655 Chromosome 5 

I 

* * * 



| Chromosome 1 

95 138 

25 | 119 122 | 

| 125 130 | 

| 132 134 " | 

I 136 | 



A double stranded DNA loop of length 41.978 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 95. This Tl control element 
has the DNA sequence 
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CAGCACGTTCTTAACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTC 
CCGC 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 138. This T2 control element has the DNA sequence 

ACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATCA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

Y73A3A.1 Y73A3A.1 ZC123.3 ZC123.2 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 1 19 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene ZC123.3 and has the 
DNA sequence 

TTGAGAACTCTGCGTCTCAACTGCCGCATTTTTTGTAGATCTACGTAGATC 
AAACCGAAATGGGACACT 

A C1/C2 short loop on chromosome 1 whose identifier is 122 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene ZC123.3 and has the 
DNA sequence 

GCACGGGGTTCTGGCCTTCCTCATTGAATTTTTCGCGCTCCATTGACAATC 
GCCTGCCGGACAACGCGTGGGAAAGTCGTGTACTCCAC 
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A C1/C2 short loop on chromosome 1 whose identifier is 125 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene ZC123.3 and has the 
DNA sequence 

ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTC 
GGCAAACTCTTTCATTTCAATTTATGAGGGAAGCCAGAA 

A CI/C2 short loop on chromosome 1 whose identifier is 130 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3*UTR to the gene ZC123.2 and has the 
DNA sequence 

CTCCCGCATITTITGTAGATCTA 

CTGAATCCACGAGCTAGGCTrAAGCTTAGGCTTAAGCTTAGGCCTTTTCTC 
AGGCTTAGGCTTAGGCTTA 

A C1/C2 short loop on chromosome 1 whose identifier is 132 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene ZC123.2 and has the 
DNA sequence 

GCTTATGCTTGGGCTTAGGCTTAGGCGTAGGCTTAGGCTTAGGCTTAGGCT 
TATGCTTAGACTTAGTCTCACTATCAGTCTTAGGCTTAGGCTTAGACTTAG 
GCTTAAGCTrAGGCTTAAGCTTAGACTTAGGCTTAGGCTTAGGCTTAGGCT 
TAGGCTTAGGTTTGGGCTTAGGCTTAGGCTTAACCTC 

A C1/C2 short loop on chromosome 1 whose identifier is 134 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene ZC123.2 and has the 
DNA sequence 
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TCTGCGTCTTTTCTCCCGCATTTTTTGTAGATCTACGTAGATCAAACCGAA 

ATGAGGCACTTTCTGAATCCACGAGCTAGGCTTAAGCTTAGGCTTAAGCTT 

AGGCCTTTTCTCAGGCTTAGGCTTAGGCTTA 

5 A C1/C2 short loop on chromosome 1 whose identifier is 136 controls the expression 

of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3TJTR to the gene ZC123.2 and has the 
DNA sequence 

1 0 GCTTATGCTTGGGCTTAGGCTT AGGCGTAGGCTT AGGCTT AGGCTT AGGCT 

TATGCTTAGACTTAGTCTCACTATCAGTCTTAGGCTTAGGCTTAGACTTAG 
GCTTAAGCTTAGGCTTAAGCTTAGACTTAGGCTTAGGCTTAGGCTTAGGCT 
TAGGCTTAGGTTTGGGCTTAGGCTTAGGCTTAACCTC 

15 The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 

short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 21719 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
20 as a RNA single strand that is 3'UTR to the gene C39F7.5 and has the DNA sequence 

ACGTTCTTAACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 
ATTTTTTGTAGATC 

25 The match between the Tl sequence and the C1/C2 sequence is 

ACGTTCTTAACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 

The match between the T2 sequence and the C1/C2 sequence is 

30 

ACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATC 



-55- 



BNSDOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



A C1/C2 short loop on chromosome 5 whose identifier is 21949 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene F16B4.4 and has the DNA sequence 

5 ACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGCATTTTTTGT 
AGATCTACGTAGATCAAGCCGAAATGAGACACTCTGACACCACG 

The match between the Tl sequence and the C1/C2 sequence is 

1 0 ACC ATGC AAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 

The match between the T2 sequence and the C1/C2 sequence is 

ACTCTGCGTCTCTTCTCCCGCATTTITTGTAGATC 

15 

A C1/C2 short loop on chromosome 5 whose identifier is 21655 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene C39F7.3 and has the DNA sequence 

20 AACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGCATTTITTG 
TAGATCTACG 

The match between the Tl sequence and the C1/C2 sequence is 
25 AACCATGCAAAATCAGTTGAGAACTCTGCGTCTCTTCTCCCGC 
The match between the T2 sequence and the C1/C2 sequence is 
ACTCTGCGTCTCTTCTCCCGCATTTTTTGTAGATC 

30 
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2. Many Connectrons control the expression of one set of genes in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes. 

Many different C1/C2 short loops can control the existence of one T1-T2 long loop. 
5 The C1/C2 short loops can be on the same chromosome or on different chromosomes 

from the T1-T2 long loop. This relationship is described as "many-to-one". This 
relationship exists in prokaryotes, archea, single-celled eukaryotes and multi-celled 
eukaryotes 

10 Example of a many-to-one connection in prokaryotes - E. coli 

In this example the existence of the T1-T2 (3197-3308) long loop is controlled by 
three C1/C2 short loops (3307, 3432 and 2218). 



15 3307 Chromosome 1 

3432 Chromosome 1 
2218 Chromosome 1 



20 | Chromosome 1 | 

3197 3308 



25 A double stranded DNA loop of length 93.542 kilo-bases on chromosome 1 is 

bounded on the left by a Tl sequence whose identifier is 3197. This Tl control 
element has the DNA sequence 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTC 
30 AATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACG 
CCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAA 
ATAAATGCTTGACTCTGTAGCGGGAA 



This double stranded DNA loop is bounded on the right by a T2 control element 
35 whose identifier is 3308. This T2 control element has the DNA sequence 
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TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTG 
ACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAG 
AACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAG 
5 GCGTATTATGCACACCCCGCGCCGCT 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 



10 


rrsC 


gltU 


rrlC rrfC aspT trpT 


yifA 


yifE 


yifB 




ilvL 


ilvG_l 


ilvM ilvE ilvD UvA 


ilvY 


ilvC 


ppiC 




b3776 


rep 


gppA rhlB trxA rhoL 


rho 


rfe 


wzzE 




wecB 


rffH 


wecD wecE wzxE 


yifM_2 


wecG 


yifK 




argX 


hisR 


leuT proM aslB 


aslA 


hemY 


hemX 


15 


hemD 


cyaA 


cyaY b3808 dapF 


uvrD 


b3814 


corA 




yigF 


yigG 


rarD yigl pldA recQ 




yigK 


pldB 




yigL 


yigM 


metR metE ysgA udp 


yigN 


ubiE 


yigP 




b3836 


yigU 


yigW_l rfaH yigC 


ubiB 


fadA 


fadB 




pepQ 


trkH 


hemG 









20 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 



A C1/C2 short loop on chromosome 1 whose identifier is 3307 controls the 
25 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3'UTR to the gene hemG and has the DNA sequence 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGG 
AATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACG 
30 CCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAA 
ATAAATGCTTGACTCTGTAGCGGGAAGGCGTATTATG...GGAGTCTGCAAC 
TCGACTCCATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACG 



-58- 



BNSDOCID: <WO 01 94542A2J_> 



WO 01/94542 



PCT/US01/16471 



GTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGT 
GGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACT 
TTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAAC 
CTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGTTCTTTG 

5 

The match between the Tl sequence and the C1/C2 sequence is 

AAAAAATGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGG 
AATAACTCCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACG 
1 0 CCGCCGGGTCAGCGGGGTTCTCCTGAGAACTCCGGC AG AGAAAGC AAAA 

ATAAATGCTTGACTCTGTAGCGGGAA 

The match between the T2 sequence and the C1/C2 sequence is 

1 5 TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTG 
ACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAG 
AACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAG 
GCGTATTATGCACACCCCGCGCCGCT 

20 A C1/C2 short loop on chromosome 1 whose identifier is 3432 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene btuB and has the DNA sequence 

TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
25 CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGG 
GTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATG 
CTTG ACTCTGT AGCGGG AAGGCGT ATTATGC AC ACC. . AC ACC ATGGG AGT 
GGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACT 
TTGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAAC 
30 CTGCGGTTGGATCACCTCCTTACCTTAAAGAAGCGT 

The match between the Tl sequence and the C1/C2 sequence is 
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TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGG 
GTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATG 
CTTGACTCTGTAGCGGGAA 

The match between the T2 sequence and the C1/C2 sequence is 

TAAATTTCCTCTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTG 
ACACGGAACAACGGCAAACACGCCGCCGGGTCAGCGGGGTTCTCCTGAG 
AACTCCGGCAGAGAAAGCAAAAATAAATGCTTGACTCTGTAGCGGGAAG 
GCGTATTATGCACACCCCGCGCCGCT 

A C1/C2 short loop on chromosome 1 whose identifier is 2218 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene clpB and has the DNA sequence 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAAC 
AACGGCAAACACGCCGCCGGGC 

The match between the Tl sequence and the C1/C2 sequence is 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAAC 
AACGGCAAACACGCCGCCGGGC 

The match between the T2 sequence and the C1/C2 sequence is 

CTTGTCAGGCCGGAATAACTCCCTATAATGCGCCACCACTGACACGGAAC 
AACGGCAAACACGCCGCCGGGC 
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Example of a many-to-one connectron in archea - M. jannaschii 

In this example the existence of the T1-T2 (1630-1643) long loop is controlled by 
four C1/C2 short loops (1629, 1642, 124 and 1533). 

1629 Chromosome 1 
1642 Chromosome 1 
124 Chromosome 1 
1533 Chromosome 1 

i 

| Chromosome 1 | 

1630 1643 



A double stranded DNA loop of length 4.998 kilo-bases on chromosome 1 is bounded 
on the left by a Tl sequence whose identifier is 1630. This Tl control element has 
the DNA sequence 

TTATTAATTAGTTCAAAGGATTTTTATTTAATTTCTAAGGGTTTGCTGGTT^ 
GATTATTTAGAATATTTGAGTTTATTGAA 

AGATTAATTAGGAAAGGAAATAAGATTTCTCTAACAGACAAGTTAAATTT 
TTGGATTTAAAAAGATAAAAAT 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1643. This T2 control element has the DNA sequence 

TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATT 
AATTATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAAT 
TTCTCTAACAAATAAGTrAAATTTTTGGATTTAAAAAGATAAAAATACTCT 
GTTTTATTATGGAAAGAAAGAT 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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MJ1597 MJ1598 MJ1599 MJ1600 MJ1601 MJ1602 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
5 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 1629 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene MJ1597 and has the DNA sequence 

10 

ATATGTTTGAAATTTGAAAATAAGAGTATTTAGAAG 

AAGGATTTTTATTTAATTTC 

TTGAGTITATTGAATTATTCA 

15 The match between the Tl sequence and the C1/C2 sequence is 

TTATTAATTAGTTCAAAGGATTTTTATTTA 
GATTATTTAGAATATTTGAGTTTATTGAATTATTCA 

20 The match between the T2 sequence and the C1/C2 sequence is 

GCTGGTTTGATTATTTAGAATATTT^ 
AAAATTA 

25 A C1/C2 short loop on chromosome 1 whose identifier is 1642 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene MJ1602 and has the DNA sequence 

ATTTAATTTCTAAGGGTTAGCTGGTTT 
30 TGAATTATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTA 
ATTTCTCTAACAAATAAGTTAAATTTTTGGAT^ 
CTGTTTTATTATGGAAAGAAAGAT 
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The match between the Tl sequence and the CI/C2 sequence is 

GCTGGTTTGATTATTTAGAATATTTGAG 
5 AAAATTA 

The match between the T2 sequence and the C1/C2 sequence is 

TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTG 
10 AATTATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAAT 
TTCTCTAACAAATAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACTCT 
GTTTTATTATGGAAAGAAAGAT 

A C1/C2 short loop on chromosome 1 whose identifier is 124 controls the expression 
15 of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 

single strand that is 3'UTR to the gene MJ01 12 and has the DNA sequence 

ATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATT^ 
TGAATTATTCAGATTTTTAAAAT 

20 

The match between the Tl sequence and the C1/C2 sequence is 

ATTTAATTTCTAAGGGTTTGCTGGTTTGATTATTTAGAATATTTGAGTT^ 
TGAATTATTCAGATTTTTAAAAT 

25 

The match between the T2 sequence and the C1/C2 sequence is 

GCTGGTTTGATTATTTAGAATATTTGAGTTTATTC 
AAAAT 

30 
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A C1/C2 short loop on chromosome 1 whose identifier is 1533 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene MJ1486 and has the DNA sequence 

5 TTTTTATTTAATTTCT 
TTTATT 

The match between the Tl sequence and the C1/C2 sequence is 

1 0 TTTTTATTTAATTTCTAAGG 
TTTATT 

The match between the T2 sequence and the C1/C2 sequence is 
1 5 GCTGGTTTGATTATTTAGAATATTTGAGTTTATT 



Example of a many-to-one connectron in single-cell eukaryotes — S. cervesiae 

20 

In this example the existence of the T1-T2 (5515-5533) long loop on chromosome 12 
is controlled by seventeen C1/C2 short loops (5516, 5532, 1939, 2323, 1942, 3286, 
3649, 4764, 4751, 5536, 6102, 8023, 7356, 3293, 3291, 3289 and 146). 

25 5516 Chromosome 12 

5532 Chromosome 12 

1939 Chromosome 4 

2323 Chromosome 5 

1942 Chromosome 5 
30 3286 Chromosome 7 

3649 Chromosome 8 

4764 Chromosome 12 

4751 Chromosome 12 

5536 Chromosome 13 
35 6 1 02 Chromosome 1 4 
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8023 Chromosome 16 
7356 Chromosome 16 
3293 Chromosome 8 
3291 Chromosome 8 
3289 Chromosome 8 
146 Chromosome 2 

i 

| Chromosome 12 | 

3197 3308 



A double stranded DNA loop of length 6.466 kilo-bases on chromosome 12 is 
bounded on the left by a Tl sequence whose identifier is 5515. This Tl control 
element has the DNA sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 

ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 

GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 

AGGTAGTAAGTAGCTTTTGGTTG 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 5533. This T2 control element has the DNA sequence 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 

ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 

ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 

TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 

TCCGGGTAAGAGACAACAGGGCT 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

YLR467W 
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This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

5 A C1/C2 short loop on chromosome 12 whose identifier is 5516 controls the 

expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 ? UTR to the gene YLR464W and has 
the DNA sequence 

1 0 AGGAAATTGTTGTTACG AAAGTC AGTG ATTATGTATTGTGTAGTATAGTAT 

ATTGTAAGAAATTTTTTTTTC 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 



15 



20 



A C1/C2 short loop on chromosome 12 whose identifier is 5532 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene YLR467W and has 
the DNA sequence 



AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCT 

TTCACTGTTTTGATITAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
25 AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 



The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

30 A C1/C2 short loop on chromosome 4 whose identifier is 1939 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
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as a RNA single strand that is 3'UTR to the gene YDR545W and has the DNA 
sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
5 ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGT^ 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGG 

1 0 The match between the Tl sequence and the C1/C2 sequence is 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATriUl'lTll'CTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
1 5 GGAAAGAGTAGG ATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 

AGGTAGTAAGTAGCTTTTGG 

The match between the T2 sequence and the C1/C2 sequence is 

20 ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGG 

25 A C1/C2 short loop on chromosome 5 whose identifier is 2323 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YER189W and has the DNA 
sequence 

30 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAAT^ 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
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GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

5 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAA 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
10 AGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTrrrTTTTCTAGGGA 
1 5 ATATGCGTTITGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTA 
AGAGACAACAGGGCT 

20 

A C1/C2 short loop on chromosome 5 whose identifier is 1942 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YEL077C and has the DNA 
sequence 

25 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
30 AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 
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AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 

ATTGTAAGAAATTTTTITTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 

GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 

AGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTC 

ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 

TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 

TCCGGGTA 

AGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 7 whose identifier is 3286 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YGR296W and has the DNA 
sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 

ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 

GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 

AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGA 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
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GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence is 

5 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
10 TCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 8 whose identifier is 3649 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene YHR219W and has the DNA 
15 sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
20 GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

25 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTITGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 

30 

The match between the T2 sequence and the C1/C2 sequence is 
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ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
5 TCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 12 whose identifier is 4764 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YLL066C and has the DNA 
10 sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTmCTAGG 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
1 5 GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

20 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATT1TITITTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTITGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 

25 

The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTt i l l iTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
30 ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTAAGAGACAACAGGGCT 
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A C1/C2 short loop on chromosome 12 whose identifier is 4751 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YLL067C and has the DNA 
5 sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGA 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
1 0 GG AAAGAGTAGGATAAAAAGAC AATCTATAAAAAGTAAAC ATAAAATAA 

AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

1 5 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAA1 TTTTT1 1 TCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 

20 

The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTG 
25 ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 13 whose identifier is 5536 controls the 
30 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3'UTR to the gene YML133C and has the DNA 
sequence 
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AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
5 CtGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

10 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 



15 



The match between the T2 sequence and the C1/C2 sequence is 



ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGT^ 
20 ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTAAGAGACAACAGGGCT 

A C1/C2 short loop on chromosome 14 whose identifier is 6102 controls the 
25 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3'UTR to the gene YNL339C and has the DNA 
sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
30 ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
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GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

5 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
10 AGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAA 
1 5 ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTT^ 

ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTAAGAGACAACAGGGCT 

20 A C1/C2 short loop on chromosome 16 whose identifier is 8023 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YPR204W and has the DNA 
sequence 

25 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCG 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

30 

The match between the Tl sequence and the C1/C2 sequence is 
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AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
5 AGGTAGTAAGTAGCTTTTGGTTG 

The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
1 0 ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTAAGAGACAACAGGGCT 

15 A C1/C2 short loop on chromosome 16 whose identifier is 7356 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YPL283C and has the DNA 
sequence 

20 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTmCTAGGGAAT 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTGAACATGCGGGTAAGAGACAACAGGGCT 

25 

The match between the Tl sequence and the C1/C2 sequence is 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
30 TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 
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The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
5 ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTAAGAGACAACAGGGCT 

10 A C1/C2 short loop on chromosome 8 whose identifier is 3293 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YHL050C and has the DNA 
sequence 

1 5 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTlTTrCTAGGGAATATGCGTTTT 

The match between the Tl sequence and the C1/C2 sequence is 

20 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTT 

The match between the T2 sequence and the C1/C2 sequence is 

25 ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGG 
ATATGCGTTTT 

A C1/C2 short loop on chromosome 8 whose identifier is 3291 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
30 as a RNA single strand that is 3'UTR to the gene YHL050C and has the DNA 

sequence 
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ATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAA 

The match between the Tl sequence and the C1/C2 sequence is 

5 

ATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAA 

The match between the T2 sequence and the C1/C2 sequence is 

10 

ATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGC 
GAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAA 

A C1/C2 short loop on chromosome 2 whose identifier is 145 controls the expression 
15 of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 

single strand that is 3'UTR to the gene YBL113C and has the DNA sequence 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGCT 

20 

The match between the Tl sequence and the C1/C2 sequence is 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

25 The match between the T2 sequence and the C1/C2 sequence is 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGCT 

30 A C1/C2 short loop on chromosome 8 whose identifier is 3289 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
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as a RNA single strand that is 3'UTR to the gene YHL050C and has the DNA 
sequence 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
5 ATCCGGGTAAGAGACAACAGGCT 

The match between the Tl sequence and the C1/C2 sequence is 
CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTG 

10 

The match between the T2 sequence and the C1/C2 sequence is 

CTATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAAC 
ATCCGGGTAAGAGACAACAGGCT 

15 

A C1/C2 short loop on chromosome 2 whose identifier is 146 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YBL1 13C and has the DNA sequence 

20 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAA 

The match between the Tl sequence and the C1/C2 sequence is 

25 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAA 

The match between the T2 sequence and the C1/C2 sequence is 
30 ATTATGTATTGTGTAGTATAGTATATTGTAAGAAA 
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Example of a many-to-one connection in multi-cell eukaryotes - C. elegai 



ns 



In this example the existence of the T1-T2 (3197-3308) long loop on chromosome 5 
is controlled by three C1/C2 short loops (4382, 4375 and 28633). 



4382 Chromosome 1 
4375 Chromosome 1 
28633 Chromosome 5 



* 



I Chromosome 5 | 
28632 28697 



A double stranded DNA loop of length 58.451 kilo-bases on chromosome 5 is 
bounded on the left by a Tl sequence whose identifier is 28632. This Tl control 
element has the DNA sequence 

GCAAAAATTGACTGAAAATTTGAATTTCCCGCAAAAAATTGACTGAAAAT 
TTGAATTTCCCGCCAAAAATTGACTGAAAATTTGAA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 28697. This T2 control element has the DNA sequence 

CAAAAAATTGACTGAAAATTTGAATTTCCCTCCAAAAATTGACTGAAAAT 

TTGAATTTCCCGCCAAAAATTGACTGAAAATTTGAATATCCCGCCAAAAA 

TTGACTGAAAATTTGAATTTCCCGCCGAAAATTAAATGAAAAATGGAATT 
TCTCGCCGAA 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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M162.8 M162.4 M162.3 M162.6 M162.2 M162.1 M162.7 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 4382 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene Y43F8B.10 and has the DNA 
sequence 

ATTATAGAAAATTTAAATTTCCCTCCAAAAAATTGACTGAAAATTTGAATT 

TCCCTCCAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAATTGACTG 

AAAATrTGAATATCCCGCCAAAAATTGACTGAAAATTTGAATTTCCCGCC 

GAAAATTAAATGAAAAATGGAATTTCTCGCCGAAAAATTCAGTAAAAATT 

TGAATTTCCTGCCAAAAATTGACTGAAAA1TTGAATTTCTTGCCAAAAAA 

GTGACTGGGAATTTGAATTTCCCTCCAAAAATTGACTGAAAlTITGAATrT 
CCCGCTAAAAGTTGACT 

The match between the Tl sequence and the C1/C2 sequence is 

CAAAAATTGACTGAAAATTTGAATTTCCCGC 

The match between the T2 sequence and the C1/C2 sequence is 

CAAAAAATTGACTGAAAATTTGAATTTCCCTCCAAAAATTGACTGAAAAT 

TTGAATTTCCCGCCAAAAATTGACTGAAAATITGAATATCCCGCCAAAAA 

TTGACTGAAAATTTGAATTTCCCGCCGAAAATTAAATGAAAAATGGAATT 
TCTCGCCGAA 

A C1/C2 short loop on chromosome 1 whose identifier is 4375 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
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as a RNA single strand that is 3'UTR to the gene Y43F8B.10 and has the DNA 
sequence 

ATTATAGAAAATTTAAATTTCCCTCCAAAAAATTGACTGAAAATTTGAATT 

TCCCTCCAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAATTGACTG 

AAAATTTGAATATCCCGCCAAAAATTGACTGAAAATTTGAATTTCCCGCC 

GAAAATTAAATGAAAAATGGAATTTCTCGCCGAAAAATTCAGTAAAAATT 

TGAATTTCCTGCCAAAAATTGACTGAAAATTTGAATTTCTTGCCAAAAAA 

GTGACTGGGAATTTGAATTTCCCTCCAAAAATTGACTGAAATTTTGAATTT 

CCCGCTAAAAGTTGACT 

The match between the Tl sequence and the C1/C2 sequence is 

CAAAAATTGACTGAAAATTTGAATTTCCCGC 

The match between the T2 sequence and the C1/C2 sequence is 

CAAAAAATTGACTGAAAATTTGAATTTCCCTCCAAAAATTGACTGAAAAT 
TTGAATTTCCCGCCAAAAATTGACTGAAAATTTGAATATCCCGCCAAAAA 
TTGACTGAAAATTTGAATTTCCCGCCGAAAATTAAATGAAAAATGGAATT 
TCTCGCCGAA 

A C1/C2 short loop on chromosome 5 whose identifier is 28633 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene M162.5 and has the DNA sequence 

CAAAAATTGACTGAAAATTTGAATTTCCCGCAAAAAATTGACTGAAAATT 
TGAATTTCCCGCCAAAAATTGACTGAAAATTTGAA 

The match between the Tl sequence and the C1/C2 sequence is 
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CAAAAATTGACTGAAAATTTGAATTTCCCGCAAAAAATTGACTGAAAATT 
TGAATTTCCCGCCAAAAATTGACTGAAAATTTGAA 

The match between the T2 sequence and the C1/C2 sequence is 
CAAAAAATTGACTGAAAATTTGAATTTCCC 



10 
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3. One connectron controls the expression of many sets of genes in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes. 

One C1/C2 short loop can control the existence of a many T1-T2 long loops. The 
5 C1/C2 short loop can be on the same chromosome or on different chromosomes from 

the T1-T2 long loops. This relationship is described as "one-to-many'\ This 
relationship exists in prokaryotes, archea, single-celled eukaryotes and multi-celled 
eukaryotes. 



10 Example of a one-to-many connectron in prokaryotes - E. coli 

In this example the existence of T1-T2 (3208-3315, 3436-3476, 3439-3478 and 
3441-3479) long loops are controlled by one C1/C2 short loop (3206). 

15 3206 Chromosome 1 

I 

* * * 

| Chromosome I | 

3208 3315 



20 



30 



35 



3206 Chromosome 1 



* „ * . * 

25 | Chromosome 1 | 

3436 3476 



3206 Chromosome 1 
I 

* * * 

| Chromosome 1 | 

3439 3478 



3206 Chromosome 1 



* * * 

| Chromosome 1 | 

40 3441 3479 
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A double stranded DNA loop of length 93.377 kilo-bases on chromosome 1 is 
5 bounded on the left by a Tl sequence whose identifier is 3208. This Tl control 

element has the DNA sequence 

ACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTGGATCA 
AGCTGAAAATTGAAACACTGAACAACGAAAGTTGTTCGTGAGTCTCTCAA 
1 0 ATTTTCGCAACACGATGATGAATCGAAAGAAACATCTTCGGGTTGTGAGG 
TTAAGCGACTAAGCGTACACGGTGGATGCCCTGGC...AGTGTGTTTCGACA 
CACTATCATTAACTGAATCCATAGGTTAATGAGGCGAACCGGGGGAACTG 
AAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTA 
GCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAATCAGT 

15 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 3315. This T2 control element has the DNA sequence 

TTTGCTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAACACTGAACAAC 
20 GAAAGTTGTTCGTGAGTCTCTCAAATTTTCGCAACTCTGAAGTGAAACATC 
TTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCCCTGGCA 
GTCAGAGGCGATGAAGGACGTGCTAATCTGCGATA...GGTTAATGAGGCG 
AACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACC 
GAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGA 
25 ATCAGTGTGTGTGTTAGTGGAAGCGTCTGGAAA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

30 rrlC rrfC aspT trpT yifA yiffi yifB ilvL ilvG_l 

ilvM ilvE ilvD ilvA ilvY ilvC ppiC b3776 rep 

gppA rhlB trxA rhoL rho rfe wzzE wecB rffH 
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wecD wecE wzxE yifM_2 wecG yifK argX hisR 
leuT proM aslB aslA hemY hemX hemD cyaA 
cyaY b3808 dapF uvrD b3814 corA yigF yigG rarD 
yigl pldA recQ yigJ yigK pldB yigL yigM metR 
metE ysgA udp yigN ubiE yigP b3836 yigU 

yigW__l rfaH yigC ubiB fadA fadB pepQ trkH 
hemG rrsA ileT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 3206 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3 f UTR to the gene rrsC and has the DNA sequence 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAG 
GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTC 
ACCTGCCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTG 
CTCTTTAAAA ATCTGGATC AAGCTGAAAATTGAAA. . . ACCGGCGATTTCCG 
AATGGGGAAACCCAGTGTGTTTCGACACACTATCATTAACTGAATCCATA 
GGTTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGA 
AAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCA 
GCCCAGAGCCTGAATCAGT 

The match between the Tl sequence and the C1/C2 sequence is 

ACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTTAAAAATCTG^ 

AGCTGAAAATTGAAACACTGAACAACGAAAGTTGTTCGTGAGTCTCTCAA 

ATTITCGCAACACGATGATGAATCGAAAGAAACATCTTCGGGTTGTGAGG 

TTAAGCGACTAAGCGTACACGGTGGATGCCCTGGC...AGTGTGTTTCGACA 

CACTATCATTAACTGAATCCATAGGTTAATGAGGCGAACCGGGGGAACTG 
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AAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTA 
GCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAATCAGT 

The match between the T2 sequence and the C1/C2 sequence is 

TTTGCTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAACACTGAACAAC 
GAAAGTTGTTCGTGAGTCTCTCAAATTTTCGCAAC 



A double stranded DNA loop of length 41.279 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 3436. This Tl control 
element has the DNA sequence 

ACGCAACGCGTGATAAGCAATTTTCGTGTCCCCTTCGTCTAGAGGCCCAG 

GACACCGCCCTTTCACGGCGGTAACAGGGGTTCGAATCCCCTAGGGGACG 

CCACTTGCTGGTT 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 3476. This T2 control element has the DNA sequence 

AGTGAAAAGCAAGGCGTCTTGCGAAGCAGACTGATACGTCCCCTTCGTCT 
AGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAGGGGTTCGAATCCC 
CTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTCACCTGCCTTAATA 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

5 A C1/C2 short loop on chromosome 1 whose identifier is 3206 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene rrsC and has the DNA sequence 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAG 
10 GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTC 
ACCTGCCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTG 
CTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAACACTGAACAACGAAA 
GTTGTTCGTGAGTCTCTCAAATTTTCGCAACACGATGATGAATCGAAAGA 
AACATCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGATGCC 
1 5 CTGGC AGTC AGAGGCGATGA AGGACGTGCTAATCTGCG ATAAGCGTCGGT 

AAGGTGATATGAACCGTTATAACCGGCGATTTCCGAATGGGGAAACCCAG 
TGTGTTTCGACACACTATCATTAACTGAATCCATAGGTTAATGAGGCGAA 
CCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAACCGA 
GATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTGAAT 
20 CAGT 

The match between the Tl sequence and the C1/C2 sequence is 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAG 
25 GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTT 

The match between the T2 sequence and the C1/C2 sequence is 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAG 
30 GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTC 
A CCTGCCTTAATA 
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A double stranded DNA loop of length 41.336 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 3439. This Tl control 
5 element has the DNA sequence 

CCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTGCTCTTT 
AAAAATCTGGATCAAGCTGAAAATTGAAACACTGAACAACGA 

10 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 3478. This T2 control element has the DNA sequence 

GTGATGT1TGAGATATTTGCTCTTTAAAAATCTGGATCAAGCTGAAAATTG 
AAACACTGAACAACGAAAGTTGTTCGTGAGTCTCTCAAATTTT 

15 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
25 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 3206 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the generrsC and has the DNA sequence 

30 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAG 
GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTC 
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ACCTGCCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTG 
CTCTTTAAAAATCTGGATCAAGCTGAAAATTGAAA. . ACCGGCGATTTCCG 
AATGGGGAAACCCAGTGTGTTTCGACACACTATCATTAACTGAATCCATA 
GGTTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGA 
5 AAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCA 
GCCCAGAGCCTGAATCAGT 

The match between the Tl sequence and the C1/C2 sequence is 

1 0 CCTTAATATCTC AAAACTCATCTTCGGGTGATGTTTGAG AT ATTTGCTCTTT 

AAAAATCTGGATCAAGCTGAAAATTGAAACACTGAACAACGA 

The match between the T2 sequence and the C1/C2 sequence is 

1 5 GTGATGTTTG AG ATATTTGCTCTTTAAAAATCTGGATC AAGCTGAAA ATTG 

AAACACTGAACAACGAAAGTTGTTCGTGAGTCTCTCAAATTTT 



20 A double stranded DNA loop of length 38.285 kilo-bases on chromosome 1 is 

bounded on the left by a Tl sequence whose identifier is 3441. This Tl control 
element has the DNA sequence 

AATTTTCGCAACACGATGATGAATCGAAAGAAACATCTTCGGGTTGTGAG 
25 GTTAAGCGACTAAGCGTACACGGTGGATGCCCTGGCAGTCAGAGGCGATG 
AAGGACGTGCTAATCTGCGATAAGCGTCGGTAAGGTGATATGAACCGTTA 
TAACCGGCGATTTCCGAATGGGGAAACCCAGTGTGT...GATGAGAGAAGA 
TTTTCAGCCTGATACAGATTAAATCAGAACGCAGAAGCGGTCTGATAAAA 
CAGAATTTGCCTGGCGGCAGTAGCGCGGTGGTCCCACCTGACCCCATGCC 
30 GAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTCCCC 
ATGCGAG 
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This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 3479. This T2 control element has the DNA sequence 

AAGAAACATCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGA 
5 TGCCCTGGCAGTCAGAGGCGATGAAGGACGTGCTAATCTGCGATAAGCGT 
CGGTAAGGTGATATGAACCGTTATAACCGGCGATTTCCGAATGGGGAAAC 
CCAGTGTGTTTCGACACACTATCATTAACTGAATCC...CAGATTAAATCAG 
AACGCAGAAGCGGTCTGATAAAACAGAATTTGCCTGGCGGCAGTAGCGC 
GGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGCG 
1 0 CCGATGGTAGTGTGGGGTCTCCCC ATGCG AGAGTAGGG AACTGCCAGGC A 

TCAAATTA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

15 
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20 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 3206controls the expression 
25 of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 

single strand that is 3'UTR to the gene rrsC and has the DNA sequence 

GTCCCCTTCGTCTAGAGGCCCAGGACACCGCCCTTTCACGGCGGTAACAG 
GGGTTCGAATCCCCTAGGGGACGCCACTTGCTGGTTTGTGAGTGAAAGTC 
30 ACCTGCCTTAATATCTCAAAACTCATCTTCGGGTGATGTTTGAGATATTTG 
CTCTTTAAAAATCTGGATC AAGCTGAAAATTG AAA. . . ACCGGCG ATTTCCG 
AATGGGGAAACCCAGTGTGTTTCGACACACTATCATTAACTGAATCCATA 
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* 

GGTTAATGAGGCGAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGA 
AAAGAAATCAACCGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCA 
GCCCAGAGCCTGAATCAGT 

5 The match between the Tl sequence and the C1/C2 sequence is 

AATTTTCGCAACACGATGATGAATCGAAAGAAACATCTTCGGGTTGTGAG 
GTTAAGCGACTAAGCGTACACGGTGGATGCCCTGGCAGTCAGAGGCGATG 
AAGGACGTGCTAATCTGCGATAAGCGTCGGTAAGGTGATATGAACCGTTA 
1 0 TAACCGGCGATTTCCGAATGGGGAAACCCAGTGTGTTTCGACAC ACTATC 

ATTAACTGAATCCATAGGTTAATGAGGCGAACCGGGGGAACTGAAACATC 
TAAGTACCCCGAGGAAAAGAAATCAACCGAGATTCCCCCAGTAGCGGCG 
AGCGAACGGGGAGCAGCCCAGAGCCTGAATCAGT 

15 The match between the T2 sequence and the C1/C2 sequence is 

AAGAAACATCTTCGGGTTGTGAGGTTAAGCGACTAAGCGTACACGGTGGA 
TGCCCTGGCAGTCAGAGGCGATGAAGGACGTGCTAATCTGCGATAAGCGT 
CGGTAAGGTGATATGAACCGTTATAACCGGCGATTTCCGAATGGGGAAAC 
20 CCAGTGTGTTTCGACACACTATCATTAACTGAATCCATAGGTTAATGAGGC 
GAACCGGGGGAACTGAAACATCTAAGTACCCCGAGGAAAAGAAATCAAC 
CGAGATTCCCCCAGTAGCGGCGAGCGAACGGGGAGCAGCCCAGAGCCTG 
AATCAGT 

25 

Example of a one-to-many connection in archea - M. jannaschii 

In this example the existence of T1-T2 (534-611, 1139-1159, and 1630-1643) long 
30 loops are controlled by one C1/C2 short loop (1642). 

1642 Chromosome 1 
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* * 
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I 
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A double stranded DNA loop of length 72.886 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 534. This Tl control 
element has the DNA sequence 

TAAGTAAATAAAATTTCTCTAACAAATAAGTTAAATT 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 61 1. This T2 control element has the DNA sequence 

TAAATAAAATTTCTCTAACAAATAAGTTA 
AAAATGCT 



This long T1/T2 double stranded DNA loop modulates the expression of the 
35 following genes 

MJ0486 MJ0487 MJ0488 MJ0489 MJ0490 MJ0492 MJ0493 

MJ0494 MJ0495 MJ0496 MJ0497 MJ0499 MJ0500 MJ0501 



30 



-92- 



BNSDOCID: <WO 0194542A2J_> 



WO 01/94542 PCT7US01/16471 



MJ0502 


MJ0503 


MJ0504 


MJ0506 


MJ0507 


MJ0S08 


MJ0509 


MJ0510 


MJ0511 


MJ0512 


MJ0513 


MJ0514 


MJ0514 


MJ0517 


MJ0519 


MJ0520 


MJ0521 


MJ0522 


MJ0523 


MJ0525 


MJ0526 


MJ0526 


MJ0529 


MJ0530 


MJ0531 


MJ0532 


MJ0534 


MJ0535 


MJ0536 


MJ0538 


MJ0539 


MJ0540 


MJ0541 


MJ0542 


MJ0543 


MJ0544 


MJ0545 


MJ0547 


MJ0548 


MJ0549 


MJ0S50 


MJ0552 


MJ0553 


MJ0554 


MJ0555 


MJ0556 


MJ0558 


MJ0559 


MJ0S60 


MJ0561 


MJ0562 


MJ0563 


MJ0564 









10 The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 

short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 1642 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
15 as a RNA single strand that is 3'UTR to the gene MJ1 602 and has the DNA sequence 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTAT 
TGAATTATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAA 
ATTTCTCTAACAAATAAGTTAAATTTTTGGATTTAAAAAGATAAAAATACT 
20 CTGTTTTATTATGGAAAGAAAGAT 

The match between the Tl sequence and the C1/C2 sequence is 

AAGTAAATAAAATTTCTCTAACAAATAAGTTAAATT 

25 

The match between the T2 sequence and the C1/C2 sequence is 

TAAATAAAATTTCTCTAACAAATAAGTTAAATTTTTGGATTTAAAAAGATA 
AAAAT 

30 
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10 



A double stranded DNA loop of length 14.509 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 1139. This Tl control 
element has the DNA sequence 

ATTTATTAATTAGTTCAAAGGAT 
TTTGATTGTTTAAAATATTTGAGTTTA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1 159. This T2 control element has the DNA sequence 

ATTTAATrTCTAAGGGTTAGCTGGTTTGATTATTT 
TGAATTATTCAGATTTTTAAAAATTA 



This long T1/T2 double stranded DNA loop modulates the expression of the 
1 5 following genes 

MJ1096 MJ1097 tRNA-Arg-3 MJ1098 MJ1099 MJU00 MJ1101 
MJ1102 MJ1103 MJ1104 MJ1105 MJ1106 MJ1107 MJ1108 

20 The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 

short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 1642 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
25 as a RNA single strand that is 3'UTR to the gene MJ1602 and has the DNA sequence 

ATTTAATTTCTAAGGGTTAGCTGGTTTC^ 
TGAATTATTCAGATTTTTAAAAATTAGGATTA 
ATTTCTCTAACAAATAAGTTAAATTTTTGG 
30 CTGTTTTATTATGGAAAGAAAGAT 

The match between the Tl sequence and the C1/C2 sequence is 
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ATTTAATTTCTAAGGGTTAGCTGGTTTGATT 

# 

The match between the T2 sequence and the C1/C2 sequence is 

5 

A7TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTAT 
TGAATTATTCAGATTTTTAAAAATTA 



10 

A double stranded DNA loop of length 4.998 kilo-bases on chromosome 1 is bounded 
on the left by a Tl sequence whose identifier is 1630. This Tl control element has 
the DNA sequence 

1 5 TTATTAATTAGTTC AAAGGATTTTTATTTAATTTCTAAGGGTTTGCTGGTTT 

GATTATTrAGAATATTTGAGTTTATTGAATTATTCAGATTTTTAAAAATTA 
AGATTAATTAGGAAAGGAAATAAGATTTCTCTAACAGACAAGTTAAATTT 
TTGGATTTAAAAAGATAAAAAT 

20 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 1643. This T2 control element has the DNA sequence 

TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGAGTTTATTG 
AATTATTCAGATTTTTAAAAATTAGGATTAATTAGGCAAGTAAATAAAAT 
25 TTCTCT AAC AAATAAGTT AAATTTTTG G ATTTAAA AAGATAAAAATACTCT 

GTTTTATTATGGAAAGAAAGAT 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

30 

MJ1597 MJ1598 MJ1599 MJ1600 MJ1601 MJ1602 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 1642 controls the 
5 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3TJTR to the gene MJ1602 and has the DNA sequence 

ATTTAATTTCTAAGGGTTAGCTGG 
TGAATTATTCAGATTTTTAAAAATT 
1 0 ATTTCTCTA AC AAATAAGTT AAATTTTTGG ATTT A AAAAGATAAA AATACT 

CTGTTTTATTATGGAAAGAAAGAT 

The match between the Tl sequence and the C1/C2 sequence is 

1 5 GCTGGTTTG ATTATTTAGAATATTTG AGTTTATTGAATTATTC AGATTTTTA 

AAAATTA 

The match between the T2 sequence and the C1/C2 sequence is 

20 TTAATTTCTAAGGGTTAGCTGGTTTGATTATTTAGAATATTTGA 
AATTATTCAGATTTTTAAAAATTA 
TTCTCTAACAAATAAGTTAAATTTTTGGATTTAAA^ 
GTTTTATTATGGAAAGAAAGAT 

Example of a one-to-many connectron in single-cell eukaryotes - S. cervesiae 

In this example the existence of T1-T2 (158-171, 293-317, 4295-4308 and 5916- 
30 5923) long loops are controlled by one C1/C2 short loop (86). 

86 Chromosome 1 
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A double stranded DNA loop of length 20.391 kilo-bases on chromosome 2 is 
bounded on the left by a Tl sequence whose identifier is 158. This Tl control 
element has the DNA sequence 

CCAATTGTTGGAATAAAAATCAACTATCATCTACTAACTAGTATTTACG^ 
ACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAATGATGAGAA 
ATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAAT 
AG 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 171. This T2 control element has the DNA sequence 
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ATAATTGTTGGAATAAAAATCAACTATCATCTACTAACTAGTATTTACGTT 
ACTAGTATATTATCATATACGGTGTTAGAAGATGACACAAATGATGAGAA 
ATAGTCATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAAT 
AGGATCAATGAATATTAACATATAAAATGATGATAATAATA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

YBL107W-A TL(UAA)B1 YBL107C YBL106C YBL105C YBL104C 
YBL103C YBL102W YBL101C 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 86 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YAR009C and has the DNA sequence 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTAC 

TAACTAGTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATG 

ACGCAAATGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCA 

AGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACGGAAT 

GAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGG 

ATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAATATT 

ATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTCACCCAT 

TTCTCAGAA 

The match between the Tl sequence and the C1/C2 sequence is 
AAATCAACTATCATCTACTAACTAGTATTTAC 
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The match between the T2 sequence and the C1/C2 sequence is 
AAATCAACTATCATCTACTAACTAGTATTTAC 
5 

A double stranded DNA loop of length 38,470 kilo-bases on chromosome 2 is 
bounded on the left by a Tl sequence whose identifier is 293. This Tl control 
element has the DNA sequence 

10 

GAATTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTA 
TCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAA 
GCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAAT 
AGGATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGT 
1 5 ATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTTGAGG AGAAC 

TTCTAGT 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 317. This T2 control element has the DNA sequence 

20 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTC 
GAGGAGAACTTCTAGTATATTCTGTA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
25 following genes 

YBL005W-B TS(AGA)B YBL004W YBL003C YBL002W YBL001C 
YBR001C YBR002C YBR003W YBR004C YBR005W YBR006W 
YBR007C YBR008C YBR009C YBR010W YBR011C YBR012C 

30 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 
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A C1/C2 short loop on chromosome 1 whose identifier is 86 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YAR009C and has the DNA sequence 

5 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTAC 
TAACTAGTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATG 
ACGCAAATGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCA 
AGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACGGAAT 
1 0 GAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCC ATTTTGAGG 

ATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAATATT 
ATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTCACCCAT 
TTCTCAGAA 

15 The match between the Tl sequence and the C1/C2 sequence is 

AAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAAT 
ATAGATTCCATTTTGAGGATTCCTATATCCT 



20 The match between the T2 sequence and the C1/C2 sequence is 



aatattagtaTgtagaaatatagattccattttgaggattcctatatcctc 
gaggagaacttctagtatattctgta 



25 



A double stranded DNA loop of length 11.020 kilo-bases on chromosome 10 is 
bounded on the left by a Tl sequence whose identifier is 4295. This Tl control 
element has the DNA sequence 
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AAACGCAAGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAA 
ACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCAT 
TTTGAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTG • 

5 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 4308. This T2 control element has the DNA sequence 

GGAAGCTGAAACGCAAGGATTGATAATGTAATAGGATCAATGAATATAA 
ACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATAT 
1 0 AG ATTCCATTTTGAGG ATTCCTATATCCTCGAGG AG AACTTCTAGTATATT 

CTGTATACCTAATATTATAGCCTTTATCAA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

15 

YJR027W YJR029W 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

20 

A CI/C2 short loop on chromosome 1 whose identifier is 87 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YAR009C and has the DNA sequence 

25 ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTAC 
TAACTAGTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATG 
ACGCAAATGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCA 
AGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACGGAAT 
GAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGG 

30 ATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAATATT 
ATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTCACCCAT 
TTCTCA 
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A double stranded DNA loop of length 5.462 kilo-bases on chromosome 13 is 
5 bounded on the left by a Tl sequence whose identifier is 5916. This Tl control 

element has the DNA sequence 

AAGCTGAAGTGCAAGGATTGATAATGTAATAGGATAATGAAACATATAA 
AACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCA 
1 0 TTTTGAGGATTCCTATATCCTCG AGGAG AACTTCTAGTATATTCTGTA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 5923. This T2 control element has the DNA sequence 

15 TAATAGGATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATT 
AGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAG 
AACTTCTAGTATATTCTGTATACCTAATATTATAGCCTTTATCAA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
20 following genes 

YML045W 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
25 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 87 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YAR009C and has the DNA sequence 



30 



ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTAC 
TAACTAGTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATG 
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ACGCAAATGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCA 
AGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACGGAAT 
GAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGG 
ATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAATATT 
5 ATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTCACCCAT 
TTCTCA 



10 Example of a one-to-many connectron in multi-cell eukaryotes - C. elegans 

m this example the existence of T1-T2 (16554-16661 and 21565-21590) long loops 
are controlled by one C1/C2 short loop (21591). 



15 



21591 Chromosome 5 



20 * *- 



| Chromosome 4 | 

16554 16661 



25 21591 Chromosome 5 

I 

* „ * _* 

| Chromosome 5 | 

21565 21590 

30 



A double stranded DNA loop of length 50.159 kilo-bases on chromosome 4 is 
35 bounded on the left by a Tl sequence whose identifier is 16554. This Tl control 

element has the DNA sequence 
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TGCCTGAAAAAATTGGCTCCGAGTTAGGACACTTGGGGTGGTCAAAAAAT 
TTTGTGACTATTGTCAAATGAAAGATCATAGTTGATAACATAAATTCCCAA 
AGTTTCATAAAAATCGATACGCAGCGAACAAAGTTATCAATT 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 16661. This T2 control element has the DNA sequence 

CACTTGGGGTGGTCAAAAAATTTTGTGATTATTGTCAAATGA 

GGTTGATAACATAAATTCCCAAAGTTTCATAAAAATCGATACGCAGCGAA 

CAAAGTTATGATT1TTGACCCGGAACTTATTTGGAGACCTA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

C23H5.7 C23H5.8a C23H5.3 C23H5.2 C23H5.9 C23H5.1 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 21591 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene F25A2.1 and has the DNA sequence 

TATTGTCAAATGAAAGATCATGGTTGATAACATAAA 

AAAAATCGATACGCAGCGAACAAAGTTATGATTTTTGACCCGGAACT^ 

TTGGAGACCTAATATT 

The match between the Tl sequence and the C1/C2 sequence is 
TTTCATAAAAATCGATACGCAGCGAACAAAGTTAT 
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10 



The match between the T2 sequence and the C1/C2 sequence is 
TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCA 



A double stranded DNA loop of length 18.142 kilo-bases on chromosome 5 is 
bounded on the left by a Tl sequence whose identifier is 21565. This Tl control 
element has the DNA sequence 

CTCCGAGTTAGGACACTTGGGGTGGACAAAAAATTTTGTGACTATTGTCA 
AATGAAAGATCATGGTTGATAA 



This double stranded DNA loop is bounded on the right by a T2 control element 
15 whose identifier is 21590. This T2 control element has the DNA sequence 

TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCACAATTTCAT 

AAAAATCGATACGCAGCGAACAAAGTTATGATTTTTGACCCG 

TTGGAGACCTAATA 

20 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

T21H3.2 T21H3.1 F25A2.1 

25 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 21591 controls the 
30 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3'UTR to the gene F25A2.1 and has the DNA sequence 
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TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCACAATTTCAT 

AAAAATCGATACGCAGCGAACAAAGTTATGATTTTTGACCCGGAACTTAT 

TTGGAGACCTAATATT 

5 The match between the Tl sequence and the C1/C2 sequence is 

TATTGTCAAATGAAAGATCATGGTTGATAA 

The match between the T2 sequence and the C1/C2 sequence is 

10 

TATTGTCAAATGAAAGATCATGGTTGATAACATAAATTCCCACAATTTCAT 

AAAAATCGATACGCAGCGAACAAAGTTATGATTTTTGACCCGGAACTTAT 

TTGGAGACCTAATA 
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4. Connectrons occur between prokaryotes and their plasmids. 

Connectron relationships exist between prokaryotes and their plasmids. These 
connectrons implement a control mechanism between the two genomes that makes it 
5 possible for them to form a symbiotic relationship. In the case of D. radiodurans the 

relationship is not symmetric. The D. radiodurans genome sends C1/C2 short loops to 
the MP1 plasmid. 

Example of a prokaryote/plasmid connectron - D. radiodurans 

0 

In this example the existence of T1-T2 (2654-2694 and 2692-2749) long loops in 
chromosome 3 that is the plasmid MP1 are controlled by one C1/C2 short loop (16) in 
chromosome 1. 



15 16 Chromosome 1 

2768 Chromosome 3 (plasmid MP1) 
2653 Chromosome 3 (plasmid MP1) 
I 

* * . * 

20 | Chromosome 3 (plasmid MP1) | 

2654 2694 
| 2693 | 



25 16 Chromosome 1 

2768 Chromosome 3 (plasmid MP1) 
2693 Chromosome 3 (plasmid MP1) 

I 

* * * 

30 | Chromosome 3 (plasmid MP1) | 

2692 2749 
| 2693 2695 I 



35 



A double stranded DNA loop of length 46.903 kilo-bases on chromosome 3 (plasmid 
MP1) is bounded on the left by a Tl sequence whose identifier is 2654. This Tl 
control element has the DNA sequence 



- 107- 



BNSDOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 

GTATGCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTG 

ACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGACTGTCAGTACA 

GCAATCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGACGC 

CTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTG 

GAATGGCTGTGCCGCGCGGACC 



10 



This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 2694. This T2 control element has the DNA sequence 



15 



GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGA 

TTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCG 

TTACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCC 

ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCAAAGCCGGTCAGCA 

CACCGCACACCTCGGCTCGTTCTGGAATGGCTGTGCCGCGCGGACCGAAC 

GCGGAATCGAGCAATCCTGTTGT 



20 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 



25 



DRB0020 DRB0021 DRB0022 DRB0023 DRB0024 DRB0025 

DRB0027 DRB0030 DRB0032 DRB0033 DRB0034 DRB0035 

DRB0037 DRB0038 DRB0039 DRB0041 DRB0042 DRB0043 

DRB0044 DKB0045 DRB0047 DRB0051 DRB0052 DRB0054 

DRB0055 DRB0057 



30 



This long TI/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2693 
controls the expression of the genes of one or more other T1/T2 long loops. This 



108- 



WISDOCID: <WO_ 



_0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



C1/C2 short loop is expressed as a RNA single strand that is 3'UTR to the gene 
DRB00S7 and has the DNA sequence 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGAC 
GCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGG 
AC 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 16 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene DR0009 and has the DNA sequence 

GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTT 
CTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCT 
GCTCAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTC 
CCGGTATGCAGCCTGCTCGGAGAGTACGATTCGT 



The match between the Tl sequence and the C1/C2 sequence is 

CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 

GTATGCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTG 

ACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGACTGTCAGTACA 

GCAATCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGACGC 

CTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTG 

GAATGGCTGTGCCGCGCGGACC 

The match between the T2 sequence and the C1/C2 sequence is 
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GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGA 
TTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCG 
TTACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCC 
ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCAAAGCCGGTCAGCA 
5 CACCGCACACCTCGGCTCGTTCTGGAATGGCTGTGCCGCGCGGACCGAAC 
GCGGAATCGAGCAATCCTGTTGT 

• A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2768 
controls the expression of the genes in this T1/T2 long loop. This C1/C2 short loop is 
10 expressed as a RNA single strand that is 3'UTR to the gene DRB0133 and has the 

DNA sequence 

GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTT 
CTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCT 
1 5 GCTCAGCGTTTTTCTCGCTGTTCCTGGACGGCTG AACGCCCTGAATCTCTC 

CCGGTATGCAGCCTGCTCGGAGAGTACGATTCGT 



The match between the Tl sequence and the C1/C2 sequence is 

20 

CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 
GTATGCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTG 
ACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGACTGTCAGTACA 
GCAATCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGACGC 
25 CTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTG 
GAATGGCTGTGCCGCGCGGACC 

The match between the T2 sequence and the C1/C2 sequence is 

30 GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGA 
TTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCG 
TTACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCC 
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ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCAAAGCCGGTCAGCA 

CACCGCACACCTCGGCTCGTTCTGGAATGGCTGTGCCGCGCGGACCGAAC 

GCGGAATCGAGCAATCCTGTTGT 

5 A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2653 

controls the expression of the genes in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene DRB0017 and has the 
DNA sequence 

1 0 CGGTCCCGCTGCGCAAGACGCAGCGG AATTTCCTG ACCGTGCTGCTC AGC 

GTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTAT 
GCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTGACGA 
TGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGA 

15 The match between the Tl sequence and the C1/C2 sequence is 

CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 
GTATGCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTG 
ACGATGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGACTGTCAGTACA 
20 GCAATCGAGAGTGGGCTGATCAGCCCACTGTGCGTTCTGGCCATCGACGC 
CTCTTTTCACCGCAAAGCCGGTCAGCACACCGCACACCTCGGCTCGTTCTG 
GAATGGCTGTGCCGCGCGGACC 

The match between the T2 sequence and the C1/C2 sequence is 

25 

GCTGAACGCCCTGAATCTCTCCCGGTATGCAGCCTGCTCGGAGAGTACGA 
TTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCG 
TTACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCC 
ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCAAAGCCGGTCAGCA 
30 CACCGCACACCTCGGCTCGTTCTGGAATGGCTGTGCCGCGCGGACCGAAC 
GCGGAATCGAGCAATCCTGTTGT 
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A double stranded DNA loop of length 68.612 kilo-bases on chromosome 3 (plasmid 
MP1) is bounded on the left by a Tl sequence whose identifier is 2692. This Tl 
5 control element has the DNA sequence 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGAC 
GCAGCGGAATTTCCTGACCGTGCTGCtCAGCGTTTTTCTCGCTGTTCCTGG 
AC 

10 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 2749. This T2 control element has the DNA sequence 

AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCT 
1 5 CAGCGTTTTTCTCGCTGTTCCTGG ACGGCTGAACGCCCTGAATCTCTCCCG 

GT 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

20 



DRB0059 


DRB0060 


DRB0061 


DRB0062 


DRB0064 


DRB0065 


DRB0066 


DRB0067 


DRB0068 


DRB0069 


DRB0070 


DRB0072 


DRB0073 


DRB0074 


DRB0076 


DRB0077 


DRB0079 


DRB0080 


DRB0081 


DRB0083 


DRB0085 


DRB0086 


DRB0087 


DRB0088 


DRB0089 


DRB0090 


DRB0092 


DRB0093 


DRB0094 


DRB0096 


DRB0097 


DRB0098 


DRB0102 


DRB0103 


DRB0104 


DRB0105 


DRB0106 


DRB0107 


DRB0111 DRB0112 







This long T1/T2 double stranded DNA loop modulates the expression of the 
30 following C 1/C2 short loops 
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A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2693 
controls the expression of the genes of one or more other T1/T2 long loops. This 
C1/C2 short loop is expressed as a RNA single strand that is 3TJTR to the gene 
DRB0057 and has the DNA sequence 

5 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGAC 
GCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGG 
AC 

10 A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2695 

controls the expression of the genes of one or more other T1/T2 long loops. This 
C1/C2 short loop is expressed as a RNA single strand that is 3'UTR to the gene 
DRB0057 and has the DNA sequence 

1 5 GCTGAACGCCCTGAATCTCTCCCGGTATGC AGCCTGCTCGGAGAGTACG A 

TTCGTCGTTGGCTGCACCGAAGTGACGATGGGGCCATTCCGTGGGGCGCG 
TTACACCAGGCGACTGTCAGTACAGCAATCGAGAGTGGGCTGATCAGCCC 
ACTGTGCGTTCTGGCCATCGACGCCTCTTTTCACCGCAAAGCCGGTCAGCA 
CACCGCACACCTCGGCTCGTTCTGGAATGGCTGTGCCGCGCGGACCGAAC 

20 GCGGAATCGAGCAATCCTGTTGT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

25 A CI/C2 short loop on chromosome 1 whose identifier is 16 controls the expression 

of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene DR0009 and has the DNA sequence 

GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTT 
30 CTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCT 
GCTCAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTC 
CCGGTATGCAGCCTGCTCGGAGAGTACGATTCGT 
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The match between the Tl sequence and the C1/C2 sequence is 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGAC 
5 GCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGG 
AC 

The match between the T2 sequence and the C1/C2 sequence is 

1 0 AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCT 
CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 
GT 

A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2768 
15 controls the expression of the genes in this T1/T2 long loop. This C1/C2 short loop is 

expressed as a RNA single strand that is 3'UTR to the gene DRB0J33 and has the 
DNA sequence 

GCTGTGAAATCACCGCTTCCAATGGGTCTGATGGCCATCCTACAGTACGTT 
20 CTCAGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCT 
GCTCAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCfc 
CCGGTATGC AGCCTGCTCGGAGAGTACGATTCGT. . . CGGACCGA ACGCGGA 
ATCGAGCAATCCTGTTGTGCCCTCATTGATGTCCAGCACCGGCAGGCCTTG 
ACGGTCGATGTCCGTCAGACCCTGACCGGGTCTGAGGCTCCAACTCGTCT 
25 GGAACAG 

The match between the Tl sequence and the CI/C2 sequence is 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGAC 
30 GCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGG 
AC 
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The match between the T2 sequence and the C1/C2 sequence is 

AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCT 
CAGCGTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCG 
GT 

A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2693 
controls the expression of the genes in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene DRB0057 and has the 
DNA sequence 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGAC 
GCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTCCTGG 
AC 

The match between the Tl sequence and the C1/C2 sequence is 

CTGATGGCCATCCTACAGTACGTTCTCAGCGCGGTCCCGCTGCGCAAGAC 

GCAGCGGAATTTCCTGACCGTGCTGCTCAGCGTTTTTCTCGCTGTTC 

AC 

The match between the T2 sequence and the C1/C2 sequence is 

AGCGCGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCT 
CAGCGTTTTTCTCGCTGTTCCTGGAC 

A C1/C2 short loop on chromosome 3 (plasmid MP1) whose identifier is 2653 
controls the expression of the genes in this T1/T2 long loop. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene DRB0017 and has the 
DNA sequence 
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CGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGC 
GTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGTAT 
GCAGCCTGCTCGGAGAGTACGATTCGTCGTTGGCTGCACCGAAGTGACGA 
TGGGGCCATTCCGTGGGGCGCGTTACACCAGGCGA 

5 

The match between the Tl sequence and the C1/C2 sequence is 

CGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGC 
GTTTTTCTCGCTGTTCCTGGAC 

10 

The match between the T2 sequence and the C1/C2 sequence is 

CGGTCCCGCTGCGCAAGACGCAGCGGAATTTCCTGACCGTGCTGCTCAGC 
GTTTTTCTCGCTGTTCCTGGACGGCTGAACGCCCTGAATCTCTCCCGGT 

15 
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5. Connectrons occur in plants and higher animals 

Connectron relationships exist in plant and higher animals. 

5 Example of a plant connectron - A. thaliania 

In this example the existence of the T1-T2 (423-469) long loop is controlled by six 
C1/C2 short loops (972, 21396, 422, 21762, 21813 and 10882). The T1-T2 long loop 
controls the expression of six genes on chromosome 2 in addition to two C1/C2 (426 
10 and 430) short loops. 



972 Chromosome 2 
21396 Chromosome 4 
422 Chromosome 2 
15 2 1 762 Chromosome 4 

21813 Chromosome 4 
10882 Chromosome 4 



20 | Chromosome 2 ) 

423 469 
I 426 430 I 



25 

A double stranded DNA loop of length 42.285 kilo-bases on chromosome 2 is 
bounded on the left by a Tl sequence whose identifier is 423. This Tl control 
element has the DNA sequence 

30 TATCTCTTTAAGGATTAAAAAGTCAAATAC 
ATTAAAAAACGAAATA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 469. This T2 control element has the DNA sequence 

35 
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TACTAATTTAATTAATT^ 
TTCAAAAATAATAACC 

This long T1/T2 double stranded DNA loop modulates the expression of the 
5 following genes 

At2g02070 At2g02080 At2g02090 At2g02100 At2g02120 At2g02130 

This long T1/T2 double stranded DNA loop modulates the expression of the 
1 0 following C1/C2 short loops 

A C1/C2 short loop on chromosome 2 whose identifier is 426 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3 f UTR to the gene At2g02060 and has the 
15 DNA sequence 

TTCCAAAAATAATAACCAATCAAAATCAACATATAAGATTTGATATCTAA 
ATTTT 



20 A C1/C2 short loop on chromosome 2 whose identifier is 430 controls the expression 

of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene At2g02060 and has the 
DNA sequence 

25 TTGCGGAAAAATAATATCATCATTA 
ATAT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

30 
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A C1/C2 short loop on chromosome 2 whose identifier is 972 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene At2g04240 and has the DNA sequence 

5 GTATGCCATTAGAAATAAAATTTTAAAAGTAAATTAATTCATCTCTTTAAA 
AATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACG 
AAATACATTATTAATTT 

The match between the Tl sequence and the C1/C2 sequence is 

10 

ATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAAAAACGA 
AATA 

The match between the T2 sequence and the C1/C2 sequence is 

15 

TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATTATTAATT 
T 

A C1/C2 short loop on chromosome 4 whose identifier is 21396 controls the 
20 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3'UTR to the gene AT4gl5300 and has the DNA 
sequence 

TGCCATTAGAAATAAAATTTTAAAGAGTAAATTAATTTATCTCTTTAAGGA 
25 TT AAAAAGTC A AATACTAATTT AATT AATT AAATTT AATT AAAAAACG AA 

ATACATTATTAATTTCCAAAA 

The match between the Tl sequence and the C1/C2 sequence is 

30 TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTA 
ATTAAAAAACGAAATA 
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The match between the T2 sequence and the C1/C2 sequence is 

TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATTATTAATT 
T 

5 

A C1/C2 short loop on chromosome 2 whose identifier is 422 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene At2g02060 and has the DNA sequence 

10 TAACCTTAATTTITGTAAGTAATTATATAGGTATGCCATTAGAAATAAAAT 
TTTAAAGAGTAAATTAATTTATCTCTTTAAGGATTAAAAAGTCAAATACTA 
ATTTAATTAATTAAATTTAATTAAAAAACGAAATA 

The match between the Tl sequence and the C1/C2 sequence is 

TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTA 
ATTAAAAAACGAAATA 

The match between the T2 sequence and the C1/C2 sequence is 

20 

TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATA 

A C1/C2 short loop on chromosome 4 whose identifier is 21762 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
25 as a RNA single strand that is 3'UTR to the gene AT4gl7510 and has the DNA 

sequence 

TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAA 
AAACGAAATACATT 

30 

The match between the Tl sequence and the C1/C2 sequence is 
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TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAAA 
AAACGAAATA 

The match between the T2 sequence and the C1/C2 sequence is 

5 

TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATT 

A C1/C2 short loop on chromosome 4 whose identifier is 21813 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
10 as a RNA single strand that is 3'UTR to the gene AT4gl7680 and has the DNA 

sequence 

TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTAATTAA 
AAACGAAATACATT 

15 

The match between the Tl sequence and the C1/C2 sequence is 

TTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATrTAATTAAA 
AAACGAAATA 

20 

The match between the T2 sequence and the C1/C2 sequence is 

TACTAATTTAATTAATTAAATTTAATTAAAAAACGAAATACATT 

25 A C1/C2 short loop on chromosome 2 whose identifier is 10882 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene At2g26540 and has the DNA 
sequence 

30 TATCTCTTTAAGGATTAAAAAGTCAAATACTAATTTAATTAATTAAATTTA 
ATTAA 
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10 



The match between the Tl sequence and the C1/C2 sequence is 

TATCTCTTTAAGGATTA^ 
ATTAA 

The match between the T2 sequence and the C1/C2 sequence is 
TACTAATTTAATTAATTAAATTTAATTAA 



Example of a animal connectron - D. megalomaster 

A double stranded DNA loop of length 88.159 kilo-bases on chromosome 4 is 
15 bounded on the left by a Tl sequence whose identifier is 3340. This Tl control 

element has the DNA sequence 

ACCTAAAAGAAGTACCGTTTTTTACT 
TATCACTTTTTGACGGACTCCGTC 
20 TTTTTTGTAAGGGGTAACATCATAAAAATT 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 3372. This T2 control element has the DNA sequence 

25 AAAAAAGTACCGCGTTTTACTCCTAATTACCAATTCTAACCATCCATATCA 
CTTTTTGACGGACTCCGTGAAAATA^ 
GTAAGGGGTAACATCATCAAAATTTGCGAAAAA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
30 following genes 

[Some of the following gene names have not been determined.] 
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CGI 1207 - CG2186 CG2157 

Oriel 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

10 A C1/C2 short loop on chromosome 4 whose identifier is 3362 controls the 

expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is S'UTR to the gene XXX and has the 
DNA sequence 

i 5 AAAAAAGTACCGCGTTTTACTCCTAATTACC AATTCTAACC ATCCATATCA 

CTTTTTGACGGACTCCGTTA 
GTAATCAAAATTTGCAAAAAATTGAAAAAAC 

A C1/C2 short loop on chromosome 4 whose identifier is 3364 controls the 
20 expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 

loop is expressed as a RNA single strand that is 3UTR to the gene XXX and has the 
DNA sequence 

CAAAATTTGAATGCAAATCGATTGGGAATCAAAAAACAAACTCAACGAG 
25 GTATGACATTCCATATTTGGGCCATTATTTCCAA 

A C1/C2 short loop on chromosome 4 whose identifier is 3366 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 1 UTR to the gene XXX and has the 
30 DNA sequence 
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TTTTTTCACAAAAATTAGGA 
AAGTTGGGTTTT 



A C1/C2 short loop on chromosome 4 whose identifier is 3369 controls the 
5 expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 

loop is expressed as a RNA single strand that is 3 ! UTR to the gene XXX and has the 
DNA sequence 

AAATCGATTGGGAATCAAAAAACAAACCTCAACGAGGTATGACATTCCAT 
1 0 ATCTGGGCCATTATTTCCAATCTTTTGATCAAAATAC 



The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

15 A C1/C2 short loop on chromosome 4 whose identifier is 3373 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene XXX and has the DNA sequence 

AAAAAAGTACCGCGTTTTACTCCTAATTACCAATTCTAACCATCC 
20 CTTTITGACGGACTCCGTGAAAATAATT^ 

GTAAGGGGTAACATCATCAAAATTTGCGAAAAA 

The match between the Tl sequence and the C1/C2 sequence is 

25 TTTTACTCCTAATTACCAATTC 

CGTGAAAATAATTTTTGGCCAAATTTTCGCATT^ 
CAT 

The match between the T2 sequence and the C1/C2 sequence is 

30 
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AAAAAAGTACCGCGTTTTACTCCTAATTACCAATTCTAACCATCCATATCA 

CTTTTTGACGGACTCCGTGAAAATAATTTTTGGCCAAAm 

GTAAGGGGTAACATCATCAAAATTTGCGAAAAA 

5 

Example of an animal connectron — H. sapiens 



All of the human genome that has been fully sequenced by both the NIH-lead global 
10 sequencing project and the Celera Genomics, Inc. project. The gene descriptors for 

this chromosome do not yet exist. Without the positions and directions of the genes, 
it is not possible to select from among the possible connectrons to determine the real 
connectrons. 



15 Human chromosome 22 has been processed and there 3 1 ,000 possible connectrons. 

The gene descriptors for all the chromosomes of the human genome should become 
available within the year. 

20 
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6. Permanent connectrons exist in prokaryotes, archea, single-celled eukaryotes 
and multi-celled eukaryotes. 

C1/C2 short loops are normally expressed as the 3 5 UTR of some gene. A class of 
connectron relationships exist that permit one C1/C2 short loop to control the 
existence of one or more T1-T2 long loops without being subject to any expression 
controls other than those of the gene to which the C1/C2 is 3'UTR. These connectron 
relationships are described as "permanent". Permanent connectrons exist in 
prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes. 

Example of a prokaryote permanent connectron - E. coli 

In this example the existence of the T1-T2 (3200-3210) long loop is controlled by a 
C1/C2 short loop (3432). The expression of this C1/C2 short loop is controlled only 
by the gene btuB. 

3432 Chromosome 1 
I 

* * * 

| Chromosome 1 1 
3200 3210 



A double stranded DNA loop of length 93.339 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 3200. This Tl control 
element has the DNA sequence 

AAGCGGCACTGCTCTTTAACAATTTATCAGACAATCTGTGTGGGCACTCG 

AAGATACGGATTCTTAACGTCGCAAGACGAAAAATGAATACCAAGTCTCA 

AGAGTGAACACGTAATTCATTACGAAGTTTAATTCTTTGAGCATCAAACTT 

TTAAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCC 

TAACACATGCAAGTCGAACGGTAACAGGAAACAGCTTGCTGTTTCGCTGA 

CGAGTGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGATGGAGGGG 
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GATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAA 
GAGGGGGACCTTCGGGCCTCTTGCCATC 

This double stranded DNA loop is bounded on the right by a T2 control element 
5 whose identifier is 3310. This T2 control element has the DNA sequence 

CAGACAATCTGTGTGGGCACTCGAAGATACGGATTCTTAACGTCGCAAGA 
CGAAAAATGAATACCAAGTCTCAAGAGTGAACACGTAATTCATTACGAAG 
TTTAATTCTTTGAGCGTCAAACTTTTAAATTGAAGAGTTTGATCATGGCTC 
1 0 AGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACA 
GGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTC 
TGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAAT 
ACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCA 
TCGGATGTGCCCAGATGGGATTAGCTAGT 

15 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 



rrsC 


gltU 


rrlC 


rrfC aspT trpT 


yifA 


yiffi 


yifB 


ilvL 


ilvG_l 


ilvM 


ilvE ilvD ilvA 


ilvY 


ilvC 


ppiC 


b3776 


rep 


gPPA 


rhlB trxA rhoL 


rho 


rfe 


wzzE 


wecB 


rffH 


wecD 


wecE wzxE 


yifM_2 


wecG 


yifK 


argX 


hisR 


leuT 


proM aslB 


aslA 


hemY 


hemX 


hemD 


cyaA 


cyaY 


b3808 dapF 


uvrD 


b3814 


corA 


yigF 


yigG 


rarD 


yigl pldA recQ 


yigJ 


yigK 


pldB 


yigL 


yigM 


metR 


metE ysgA udp 


yigN 


ubiE 


yigP 


b3836 


yigU 


yigW_l rfaH yigC 


ubiB 


fadA 


fadB 


pepQ 


trkH 


hemG 











30 The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 

short loops. 
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A C1/C2 short loop on chromosome 1 whose identifier is 3432 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as 
a RNA single strand that is 3'UTR to the gene btuB and has the DNA sequence 

5 TGCGCGGTCAGAAAATTATTTTAAATTTCCTCTTGTCAGGCCGGAATAACT 
CCCTATAATGCGCCACCACTGACACGGAACAACGGCAAACACGCCGCCGG 
GTCAGCGGGGTTCTCCTGAGAACTCCGGCAGAGAAAGCAAAAATAAATG 
CTTGACTCTGTAGCGGGAAGGCGTATTATGCACACC...TGCAACTCGACTC 
CATGAAGTCGGAATCGCTAGTAATCGTGGATCAGAATGCCACGGTGAATA 

1 0 CGTTCCCGGGCCTTGTAC ACACCGCCCGTCACACCATGGGAGTGGGTTGC 

AAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCACTTTGTGAT 
TCATGACTGGGGTGAAGTCGTAACAAGGTAACCGTAGGGGAACCTGCGGT 
TGGATCACCTCCTTACCTTAAAGAAGCGT 

15 The match between the Tl sequence and the C1/C2 sequence is 

AAGCGGCACTGCTCTTTAACAATTTATCAGACAATCTGTGTGGGCACTCG 

AAGATACGGATTCTTAACGTCGCAAGACGAAAAATGAATACCAAGTCTCA 

AGAGTGAACACGTAATTCATTACGAAGTTTAATTCTTTGAGC 

20 

The match between the T2 sequence and the C1/C2 sequence is 

CAGACAATCTGTGTGGGCACTCGAAGATACGGATTCTTAACGTCGCAAGA 
CGAAAAATGAATACCAAGTCTCAAGAGTGAACACGTAATTCATTACGAAG 

25 TTTAATTCTrrGAGCGTCAAACTTTTAAATTGAAGAGTTTGATCATGGCTC 
AGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAACGGTAACA 
GGAAGAAGCTTGCTTCTTTGCTGACGAGTGGCGGACGGGTGAGTAATGTC 
TGGGAAACTGCCTGATGGAGGGGGATAACTACTGGAAACGGTAGCTAAT 
ACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTTCGGGCCTCTTGCCA 

30 TCGGATGTGCCCAGATGGGATTAGCTAGT 
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Example of an archea permanent connectron - H. pylori 

In this example the existence of the T1-T2 (812-882) long loop is controlled by a 
5 C1/C2 short loop (1241). The expression of this C1/C2 short loop is controlled only 

by the gene HP1 535. 

1241 Chromosome 1 



1 Chromosome 1 

812 882 



15 

A double stranded DNA loop of length 96.385 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 812. This Tl control 
element has the DNA sequence 

20 TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 882. This T2 control element has the DNA sequence 

25 TAGCGGAACTAAAGCATTCATCCCAAACACTAAAGATATTTGG 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 



HP0999 


HP1000 


HP1001 


HP1002 


HP1003 


HP1005 


HP1006 


HP 1008 


HP1009 


HPtRNA-Pro 


HP1010 


HP1011 


HP1013 


HP1015 


HP1017 


HP1018 


HP1020 


HP1021 


HP1022 


HP 1023 


HP 1024 


HP 1025 


HP1027 


HP1028 


HP1030 


HP1031 


HP1033 


HP 1034 


HP1038 


HP1039 


HP1040 


HP1041 


HP 1042 


HP1043 


HP 1044 
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HP1045 HP1046 HP1051 HP1052 HP1055 HP1056 HP1058 

HP1060 HP1065 HPtRNA-Ser HP1066 HP1067 HP1069 HP1070 

HP1074 HP1075 HP1076 HP1077 HP1078 HP1079 HP1080 

HP1081 HP1083 HP1084 HP1085 HP1088 HP1091 HP1092 

5 HP1093 HP1094 HP1095 HP1096 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

10 A C1/C2 short loop on chromosome 1 whose identifier is 1241 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed, 
as a RNA single strand that is 3'UTR to the gene HP1535 and has the DNA sequence 



15 



TTTTACTCATAGGGTITTTATAGTTCCTAGCGGAACTAAAGCATTCATCCC 
AAACA 



20 



The match between the Tl sequence and the C1/C2 sequence is 



TTTTACTCATAGGGTTTTTATAGTTCCTAGCGGAACTAAAGCA 



The match between the T2 sequence and the C1/C2 sequence is 



TAGCGGAACTAAAGCATTCATCCCAAACA 



25 



Example of a single-celled permanent connectron - S. cervesiae 



30 



In this example the existence of the T1-T2 (5515-5533) long loop is controlled by a 
C1/C2 short loop (6102). The expression of this C1/C2 short loop is controlled only 
by the gene YNL339C. 
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6102 Chromosome 14 

i 

* * * 

| Chromosome 12 | 

5515 5533 



A double stranded DNA loop of length 6.466 kilo-bases on chromosome 12 is 
bounded on the left by a Tl sequence whose identifier is 5515. This Tl control 
element has the DNA sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGA 

TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 5533. This T2 control element has the DNA sequence 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTITTTTTTCTAGGGA 

ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTG 

ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 

TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 

TCCGGGTAAGAGACAACAGGGCT 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

YLR467W 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 
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A C1/C2 short loop on chromosome 14 whose identifier is 6102 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YNL339C and has the DNA 
5 sequence 

AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTm 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
1 0 GGAAAG AGTAGG ATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 

AGGTAGTAAGTAGCTTTTGGTTGAACATCCGGGTAAGAGACAACAGGGCT 

The match between the Tl sequence and the C1/C2 sequence is 

1 5 AGGAAATTGTTGTTACGAAAGTCAGTGATTATGTATTGTGTAGTATAGTAT 
ATTGTAAGAAATTTTTTTTTCTAGGGAATATGCGTTTTGATGTAGTAGTAT 
TTCACTGTTTTGATTTAGTGTTTGTTGCACGGCAGTAGCGAGAGACAAGTG 
GGAAAGAGTAGGATAAAAAGACAATCTATAAAAAGTAAACATAAAATAA 
AGGTAGTAAGTAGCTTTTGGTTG 

20 

The match between the T2 sequence and the C1/C2 sequence is 

ATTATGTATTGTGTAGTATAGTATATTGTAAGAAATTTTTTTTTCTAGGGA 
ATATGCGTTTTGATGTAGTAGTATTTCACTGTTTTGATTTAGTGTTTGTTGC 
25 ACGGCAGTAGCGAGAGACAAGTGGGAAAGAGTAGGATAAAAAGACAATC 
TATAAAAAGTAAACATAAAATAAAGGTAGTAAGTAGCTTTTGGTTGAACA 
TCCGGGTAAGAGACAACAGGGCT 



30 

Example of a multi-celled permanent connectron - C. elegans 
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30 



In this example the existence of the T1-T2 (5515-5533) long loop is controlled by a 
C1/C2 short loop (6102). The expression of this C1/C2 short loop is controlled only 
by the gene YNL339C. 

24442 Chromosome 5 
I 

* * * 

| Chromosome 1 | 

569 596 



A double stranded DNA loop of length 30.606 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 569. This Tl control 
1 5 element has the DNA sequence 

AAATCGAGCCCGTAAATCGACACAAGCGCTACAGTAGTC 

This double stranded DNA loop is bounded on the right by a T2 control element 
20 whose identifier is 596. This T2 control element has the DNA sequence 

AGTGCTACAGTAGTCATTTAAAGAATTACTGTAGTTTTCGCT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
25 short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 24442 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene F20D6.4 and has the DNA sequence 



GAGCCCGTAAATCGACACAAGCGCTACAGTAGTCATTTAAAGAATTACTG 
TAGTTTTC 

The match between the Tl sequence and the C1/C2 sequence is 
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GAGCCCGTAAATCGACACAAGCGCTACAGTAGTC 
The match between the T2 sequence and the C1/C2 sequence is 
GCTACAGTAGTCATTTAAAGAATTACTGTAGTTTTC 
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7. Transient connectrons exist in prokaryotes, archea, single-celled eukaryotes 
and multi-celled eukaryotes. 

A class of connectron relationships exist that permit one C1/C2 short loop to control 
5 the existence of one or more T1-T2 long loops such that this C1/C2 short loop is itself 

subject to expression control by another T1-T2 long loop which surrounds it. These 
connectron relationships are described as "transient". Transient connectrons exist in 
prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes. 

10 Example of a prokaryote transient connectron - E. coli 

In this example the existence of the T1-T2 (3227-3329) long loop is controlled by the 
C1/C2 (3225) short loop. The expression of this C1/C2 short loop is controlled by the 
existence of the T1-T2 (3216-3224) long loop. The existence of this T1-T2 long loop 
15 is itself determined by the expression of the C1/C2 (3223) short loop. The C1/C2 

(3225) short loop is the transient connectron. 

3223 Chromosome 1 

I 

20 * * * 

| Chromosome 1 | 

3216 3324 
I 3225 | 



25 



3225 Chromosome 1 



* * 

| Chromosome 1 | 

30 3227 3329 



A double stranded DNA loop of length 93.464 kilo-bases on chromosome 1 is 
35 bounded on the left by a Tl sequence whose identifier is 3216. This Tl control 

element has the DNA sequence 
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AGCGCAAGCGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACT 
ATAACGGTCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGC 
ACGAATGGCGTAATGATGGCCAGGCTGTCTCCACCCGAGACTCAGTGAAA 
TTGAACTCGCTGTGAAGATGCAGTGTACCCGCGGCAAGACGGAAAGACCC 
5 CGTGAACCTTTACTATAGCTTGACACTGAACATTGAGCCTTGATGTGTAGG 
ATAGGTGGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGGAGCCGACCT 
TGAAATACCACCCTTTAATGTTTGATGTTCTAACGT 

This double stranded DNA loop is bounded on the right by a T2 control element 
10 whose identifier is 3324. This T2 control element has the DNA sequence 

CCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAGGTAGCGAAATTCC 
TTGTCGGGTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCCAGGCTG 
TCTCCACCCGAGACTCAGTGAAATTGAACTCGCTGTGAAGATGCAGTGTA 
1 5 CCCGCGGCAAGACGGAAAGACCCCGTGAACCTTTACTATAGCTTGACACT 
GAACATTGAGCCTTGATGTGTAGGATAGGTGGGAGGCTTTGAAGTGTGGA 
CGCCAGTCTGCATGGAGCCGACCTTGAAATACCACCCTTTAATGTTTGATG 
TTCTAACGTTGACCCGTAATCCGGGTTGCGGACAGT 

20 This long T1/T2 double stranded DNA loop modulates the expression of the 

following genes 
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This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 3225 controls the 
expression of the genes of one or more other T1/T2 long loops. This CI/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene rrlC and has the 
DNA sequence 

AAACAGAATTTGCCTGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCAT 
GCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTC 
CCCATGCGAGAGTAGGGAACTGCCAGGCATCAAATTA 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 3323 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene rrlA and has the DNA sequence 

GCGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGG 

TCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATG 

GCGTAATGATGGCCAGGCTGTCTCCACCCGAGACTCAGTGAAATTGAACT 

CGCTGTGAAGATGCAGTGTACCCGCGGCAAGACGGA...AACAGAATTTGC 

CTGGCGGCAGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAA 

GTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTC 

The match between the Tl sequence and the C1/C2 sequence is 

GCGAAGCTCTTGATCGAAGCCCCGGTAAACGGCGGCCGTAACTATAACGG 

TCCTAAGGTAGCGAAATTCCTTGTCGGGTAAGTTCCGACCTGCACGAATG 

GCGTAATGATGGCCAGGCTGTCTCCACCCGAGACTCAGTGAAATTGAACT 
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CGCTGTGAAGATGCAGTGTACCCGCGGCAAGACGGAAAGACCCCGTGAA 
CCTTTACTATAGCTTGACACTGAACATTGAGCCTTGATGTGTAGGATAGGT 
GGGAGGCTTTGAAGTGTGGACGCCAGTCTGCATGGAGCCGACCTTGAAAT 
ACCACCCTTTAATGTTTGATGTTCTAACGT 

5 

The match between the T2 sequence and the C1/C2 sequence is 

CCCGGTAAACGGCGGCCGTAACTATAACGGTCCTAAGGTAGCGAAATTCC 
TTGTCGGGTAAGTTCCGACCTGCACGAATGGCGTAATGATGGCCAGGCTG 
1 0 TCTCCACCCGAGACTCAGTGAAATTG AACTCGCTGTGAAGATGCAGTGTA 

CCCGCGGCAAGACGGAAAGACCCCGTGAACCTTTACTATAGCTTGACACT 
GAACATTGAGCCTTGATGTGTAGGATAGGTGGGAGGCTTTGAAGTGTGGA 
CGCCAGTCTGCATGGAGCCGACCTTGAAATACCACCCTTTAATGTTTGATG 
TTCTAACGTTGACCCGTAATCCGGGTTGCGGACAGT 

15 



A double stranded DNA loop of length 93.749 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 3227. This Tl control 
20 element has the DNA sequence 

AGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTAGGGAACTGCCA 
GG 

25 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 3329. This T2 control element has the DNA sequence 

CATGCGAGAGTAGGGAACTGCCAGGCATCAAATAAAACGAAAGGCTCAG 
TCG 



30 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

15 

A C1/C2 short loop on chromosome 1 whose identifier is 3225 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene rrlC and has the DN A sequence 

20 AAACAGAATTTGCCTGGCGGCCGTAGCGCGGTGGTCCCACCTGACCCCAT 
GCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAGTGTGGGGTCTC 
CCCATGCGAGAGTAGGGAACTGCCAGGCATCAAATTA 

The match between the Tl sequence and the C1/C2 sequence is 

25 

AGCGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTAGGGAACTGCCA 
GG 

The match between the T2 sequence and the C1/C2 sequence is 

30 

CATGCGAGAGTAGGGAACTGCCAGGCATCAAAT 
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35 



Example of an archea transient connectron - M. jannaschii 

In this example the existence of the T1-T2 (1 139-1 159) long loop is controlled by the 
C1/C2 (533) short loop. The expression of this C1/C2 short loop is controlled by the 
existence of the T1-T2 (532-622) long loop. The existence of this T1-T2 long loop is 
itself determined by the expression of the C1/C2 (1629) short loop. The C1/C2 (533) 
short loop is the transient connectron. 

1629 Chromosome 1 



) Chromosome 1 

15 532 622 

I 533 I 



533 Chromosome 1 

i 

* * * 

| Chromosome 1 ) 

1139 1159 



A double stranded DNA loop of length 78.672 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 532. This Tl control 
element has the DNA sequence 

ATATGTTTGAAATTTGAAAATAAGAGTATTTAG 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 622. This T2 control element has the DNA sequence 

TTGAAAATAAGAGCATTTAGAAGTTATTAATTAGTTCAAAGGATTTT 
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¥ This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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15 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 533 controls the expression 
20 of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 

expressed as a RNA single strand that is 3 ! UTR to the gene MJ0485 and has the DNA 
sequence 

ATTTTTATTTAATTTCTA 
25 GTTTATTGAATT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

30 A C1/C2 short loop on chromosome 1 whose identifier is 1629 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene MJ1597 and has the DNA sequence 
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ATATGTTTGAAATTTGAAAATAAGAGTATTTAG 

AAGGATTTTTATTTAATTTCTAAGGGTTTC 

TTGAGTTTATTGAATTATTCAGATTTTTAAAAATTA 

The match between the Tl sequence and the C1/C2 sequence is 

ATATGTTTGAAATTTGAAAATAAGAGTATTTAG 

The match between the T2 sequence and the C1/C2 sequence is 

ATTTAGAAGTTATTAATTAGTTCAAAGGATTTT 



A double stranded DNA loop of length 14.509 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 1139. This Tl control 
element has the DNA sequence 

ATTTATTAATTAGTTCAAAGGA^ 
TTTGATTGTTTAAAATATTTGAGTTTA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1 159. This T2 control element has the DNA sequence 

ATTTAATTTCTAAGGGTTAGCTGGTTTGATTATTT 
TGAATTATTCAGATTTTTAAAAATTA 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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MJ1096 MJ1097 tRNA-Arg-3 MJ1098 MJ1099 MJ1100 MJ1101 
MJ1102 MJ1103 MJ1104 MJ1105 MJ1106 MJ1107 MJ1108 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
5 short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 533 controls the expression 
of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene MJ0485 and has the DNA sequence 

10 

ATTTTTATTTAATTTCTAA 
GTTTATTGAATT 

The match between the Tl sequence and the C1/C2 sequence is 

15 

ATTTTTATTTAATTTCTAAGGGTTAGCTGGTTTGATT 

The match between the T2 sequence and the C1/C2 sequence is 

20 ATTTAATTTCTAAGGGTTAGCT 
TGAATT 



25 Example of a single-celled transient connectron - S. cervesiae 

In this example the existence of the T1-T2 (2840-2859) long loop is controlled by the 
C1/C2 (298) short loop. The expression of this C1/C2 short loop is controlled by the 
existence of the T1-T2 (293-320) long loop. The existence of this T1-T2 long loop is 
30 itself determined by the expression of the C1/C2 (86) short loop. The C1/C2 (298) 

short loop is the transient connectron. 



-143- 



BNSDOCID: <WO 0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



10 



86 Chromosome 1 
I 

* . * 

| Chromosome 1 
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I 298 I 



298 Chromosome 1 



* * _* 

| Chromosome 7 | 

15 2840 2859 



A double stranded DNA loop of length 38.470 kilo-bases on chromosome 2 is 
20 bounded on the left by a Tl sequence whose identifier is 293. This Tl control 

element has the DNA sequence 

GAATTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTA 
TCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAA 
25 GCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAAT 
AGGATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGT 
ATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTTGAGGAGAAC 
TTCTAGT 



30 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 320. This T2 control element has the DNA sequence 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTC 
GAGGAGAACTTCTAGTATATTCTGTA 

35 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 
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YBL005W-B TS(AGA)B YBL004W YBL003C YBL002W YBL001C 
YBR001C YBR002C YBR003W YBR004C YBR005W YBR006W 
YBR007C YBR008C YBR009C YBR010W YBR011C YBR012C 

5 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 2 whose identifier is 298 controls the expression 
10 of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 

expressed as a RNA single strand that is 3'UTR to the gene YBL005W-B and has the 
DNA sequence 

ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTAT 
15 CAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGAT 
GACATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCA 
AGGATTGATAATGTAATAGGATAATGAAACATATAAAACGGAATGAGGA 
ATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCT 
ATATCCTTGAGGAGAACTTCTAGTATATTCTGTATACCTAATATTATAGCC 
20 TTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

25 A C1/C2 short loop on chromosome 1 whose identifier is 86 controls the expression 

of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YAR009C and has the DNA sequence 

ATCTATTACATTATGGGTGGTATGTTGGAATAGAAATCAACTATCATCTAC 
30 TAACTAGTATTTACATTACTAGTATATTATCATATACGGTGTTAGAAGATG 
ACGCAAATGATGAGAAATAGTCATCTAAATTAGTGGAAGCTGAAACGCA 
AGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACGGAAT 
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GAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGG 
ATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAATATT 
ATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTCACCCAT 
TTCTCAGAA 

5 

The match between the Tl sequence and the C1/C2 sequence is 

AAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTAGAAAT 
ATAGATTCCATTTTGAGGATTCCTATATCCT 

10 

The match between the T2 sequence and the C1/C2 sequence is 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTC 
GAGGAGAACTTCTAGTATATTCTGTA 

15 

A double stranded DNA loop of length 5.302 kilo-bases on chromosome 7 is bounded 
on the left by a Tl sequence whose identifier is 2840. This Tl control element has 
20 the DNA sequence 

TCTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATC 
AATATATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGC 
TGTCATCGAAGTTAGAGGAAGCTGAAACGCAAGGATTGATAATGTAATAG 
25 GATCAATGAATATAAACATATAAAACGGAATGAGGAATAATCGTAATATT 
AGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTCGAGGAG 
AACTTCTAGTATATTCTGTATACCTAAATTATAGCCTTTATCAACAATGGA 
ATCCCAACAA 

30 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 2859. This T2 control element has the DNA sequence 
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CTATCAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGA 
TGATGACATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAA 
CGCAAGGATTGATAATGTAATAGGATCAATGAATATAAACATATAAAACG 
GAATGAGGAATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTT 
5 GAGGATTCCTATATCCTCGAGGAGAACTTCTAGTATATTCTGTATACCTAA 
TATTATAGCCTTTATCAACAATGGAATCCCAACAATTATCTCAACATTCAC 
ATATTTCTCAT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
10 short loops. 

A C1/C2 short loop on chromosome 2 whose identifier is 298 controls the expression 
of the genes in this T1/T2 Jong loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3'UTR to the gene YBL005W-B and has the DNA sequence 

15 

ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTAT 
CAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGAT 
GACATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCA 
AGGATTGATAATGTAATAGGATAATGAAACATATAAAACGGAATGAGGA 
20 ATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCT 
ATATCCTTGAGGAGAACTTCTAGTATATTCTGTATACCTAATATTATAGCC 
TTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 

Tlie match between the Tl sequence and the C1/C2 sequence is 

25 

TGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAA 

TATATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTG 

TCATCGAAGTTAGAGGAAGCTGAA 

30 The match between the T2 sequence and the C1/C2 sequence is 
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CTATCAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGA 
TGATGACATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAA 



Example of a multi-celled transient connectron - C. elegans 

In this example the existence of the T1-T2 (22072-22108) long loop is controlled by 
the C1/C2 (125) short loop. The expression of this C1/C2 short loop is controlled by 
10 the existence of the T1-T2 (110-129) long loop. The existence of this T1-T2 long 

loop is itself determined by the expression of the C1/C2 (16859) short loop. The 
C1/C2 (125) short loop is the transient connectron. 

1 6859 Chromosome 4 
I 

* * 

| Chromosome 1 | 

110 129 
I 125 | 

125 Chromosome 1 
I 

* * 

| Chromosome 5 | 

22072 22108 



15 



20 



25 



30 A double stranded DNA loop of length 18.855 kilo-bases on chromosome 1 is 

bounded on the left by a Tl sequence whose identifier is 110. This Tl control 
element has the DNA sequence 



35 



AGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGC 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 129. This T2 control element has the DNA sequence 
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TTCTCCCGCATTTTTTGTAGATCTACGTAGATCAAACCGAAATGAGGCACT 
TTCTGAATCCACGAGCTAGGCTTAAGCTTAGGCTTAAGCTTAGGCCTTTTC 
TCAGGCTTAGGCTTAGGCTTA 

5 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

ZC123.3 ZC123.2 

10 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C 1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 125 controls the expression 
15 of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 

expressed as a RNA single strand that is 3'UTR to the gene ZC123.3 and has the 
DNA sequence 

ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTC 
20 GGCAAACTCTTTCATTTCAATTTATGAGGGAAGCCAGAA 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

25 A C1/C2 short loop on chromosome 4 whose identifier is 16859 controls the 

expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene F58E2.7 and has the DNA sequence 

CTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTT 
30 AGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAG 
GCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGCT 
TAAGCTTAGACTTA 
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The match between the Tl sequence and the C1/C2 sequence is 
AGCTTAGGCTTAAGCTTAGGCTTAAGCTTAGGC 

5 

The match between the T2 sequence and the C1/C2 sequence is 

TAGGCTTAAGCTTAGGCTTAAGCTTAGGC 

10 

A double stranded DNA loop of length 51.031 kilo-bases on chromosome 5 is 
bounded on the left by a Tl sequence whose identifier is 22072. This Tl control 
element has the DNA sequence 

15 

CGCAACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGACCTA 
GTTCGGC 

This double stranded DNA loop is bounded on the right by a T2 control element 
20 whose identifier is 22108. This T2 control element has the DNA sequence 

TGACAATCGCCTGCCGGACAACGCGTGGAAAAGTGTCGTGTACTCCACAC 
GGACAAATACATTTAGTTTTACAACTAAAATCGAACCGCGACGCGACACG 
CAACGCGACGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGT 
25 TCGGCAAACTCTTCTATTTC 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

30 F36H93 F36H9.4 F36H9.5 F36H9.2 F36H9.1 F36H9.6 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 125 controls the expression 
5 of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 

single strand that is 3'UTR to the gene ZC123.3 and has the DNA sequence 

ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTC 
GGCAAACTCTTTCATTTCAATTTATGAGGGAAGCCAGAA 

10 

The match between the Tl sequence and the C1/C2 sequence is 
ACGCGCCGTAAATCTACCCCAGATATGGCCGAGCCAAAATG 

15 The match between the T2 sequence and the C1/C2 sequence is 

CGTAAATCTACCCCAGATATGGCCGAGCCAAAATGGCCTAGTTCGGCAAA 
CTCTT 
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8. Self-limiting connectrons occur in prokaryotes, archea, single-celled 
eulcaryotes and multi-celled eukaryotes 

A class of connectron relationships exist that permit one C1/C2 short loop to control 
5 the existence of the T1-T2 long loop that surrounds it. These connectron relationships 

are described as "self-limiting". Self-limiting connectrons exist in prokaryotes, 
archea, single-celled eukaryotes and multi-celled eukaryotes. 

Example of a prokaryotic self-limiting connectrons - E. coli 

10 

In this example the existence of the T1-T2 (1704-1718) long loop is controlled by 
two C1/C2 (1 705 and 1713) short loops. The expression of these C1/C2 short loops is 
controlled by the existence of the T1-T2 (1704-1718) long loop. The existence of this 
T1-T2 long loop is itself determined by the expression of the two C1/C2 (1705 and 
15 1713) short loops. The CI/C2 (1705 and 1713) short loops are the self-limiting 

connectrons. 



1705 Chromosome 1 
1713 Chromosome 1 

20 | 

* * * 

| Chromosome 1 | 

1704 1718 
| 1705 1713 | 

25 



A double stranded DNA loop of length 15.259 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 1704. This Tl control 
30 element has the DNA sequence 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACT 
GTTAATCCGTATGTCACTGGT 
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This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1718. This T2 control element has the DNA sequence 

TTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTC 
5 CAGTCAGAGGAGCCAAATTC 



This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

10 asnT bl978 M979 M980 shiA amn bl983 asnW 

yeeO asnU 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 



15 



20 



A C1/C2 short loop on chromosome 1 whose identifier is 1705 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 f UTR to the gene and has the DNA 
sequence 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACT 
GTTAATCCGTATGTCACTGGTTCGAGTCCAGTCAGAGGAGCCAAATTC 



A C1/C2 short loop on chromosome 1 whose identifier is 1713 controls the 
25 expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 

loop is expressed as a RNA single strand that is 3TJTR to the gene asnW and has the 
DNA sequence 

CACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTAT 
30 GTCACTGGTTCGAGTCCAGTCAGAGGAGCCAAATT 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 1705 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene and has the DNA sequence 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACT 
GTTAATCCGTATGTCACTGGTTCGAGTCCAGTCAGAGGAGCCAAATTC 

The match between the Tl sequence and the C1/C2 sequence is 

CGCCCCGTTCACACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACT 
GTTAATCCGTATGTCACTGGT 

The match between the T2 sequence and the C1/C2 sequence is 

TTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTC 
CAGTCAGAGGAGCCAAATTC 

A C1/C2 short loop on chromosome 1 whose identifier is 1713 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene asnW and has the DNA sequence 



25 CACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTAT 
GTCACTGGTTCGAGTCCAGTCAGAGGAGCCAAATT 

The match between the Tl sequence and the C1/C2 sequence is 

30 CACGATTCCTCTGTAGTTCAGTCGGTAGAACGGCGGACTGTTAATCCGTAT 
GTCACTGGT 



20 
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The match between the T2 sequence and the C1/C2 sequence is 

TTCAGTCGGTAGAACGGCGGACTGTTAATCCGTATGTCACTGGTTCGAGTC 
CAGTCAGAGGAGCCAAATT 



Example of a archea self-limiting connections — M. jannaschii 

10 In this example the existence of the T1-T2 (1447-1471) long loop is controlled by 

two C1/C2 (1448 and 1470) short loops. The expression of these C1/C2 short loops is 
controlled by the existence of the T1-T2 (1447-1471) long loop. The existence of this 
T1-T2 long loop is itself determined by the expression of the two C1/C2 (1705 and 
1713) short loops. The C1/C2 (1448 and 1470) short loops are the self-limiting 

15 connectrons. 



1448 Chromosome 1 
1470 Chromosome 1 

I 

20 * * * 



| Chromosome 1 | 

1447 1471 
| 1448 1470 | 



A double stranded DNA loop of length 22.675 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 1447. This Tl control 
element has the DNA sequence 

TTATAGAACATTATGAAGCTTTTTACT 

CCATTACTTGGAAATCTATTTAAAACCTCTTTAATCTTATGATA 



This double stranded DNA loop is bounded on the right by a T2 control element 
35 whose identifier is 1471 . This T2 control element has the DNA sequence 
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CAACTAACAACCGTATCGAATTTACCATTACTTGGAAATCTATTTAAAACC 
TCTTTAATCTTGTGATAATAAATTCTAATCGATTCGTGACTTAT 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

MJ1402 MJ1403 MJ1404 MJ1405 MJ1406 MJ1407 MJ1408 
MJ1409 MJ1410 MJ1411 MJ1412 MJ1413 MJ1414 MJ1415 
MJ1416 MJ1417 MJ1418 MJ1419 MJ1420 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 1448 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene MJ1401 and has 
the DNA sequence 

TTATAGAACATTATGAAGCTTTTTACTC 

CCATTACTTGGAAATCTAT1TAAAACCTCTTTAATCTTATGATAATAAATT 
CTAATCGATTCGTGACTTAT 

A C1/C2 short loop on chromosome 1 whose identifier is 1470 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3 r UTR to the gene MJ1420 and has 
the DNA sequence 

TTATAGAACATTATGAAGCTTTTTACTC 

CCATTACTTGGAAATCTATTTAAAACCTCTTTAATCTTGTGATAATAAAT^ 
CTAATCGATTCGTG 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 1470 controls the 
5 expression of the genes in this T1/T2 long loop.This C1/C2 short loop is expressed as 

a RNA single strand that is 3'UTR to the gene MJ1420 and has the DNA sequence 

TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTA. 
CCATTACTTGGAAATCTATTTAAAACCTCTTTAATCTTGTGATAATAAATT 
10 CTAATCGATTCGTG 

The match between the Tl sequence and the C1/C2 sequence is 

TTATAGAACATTATGAAGCTirTTACTCAACTAACAACCGTATCGAATTTA 
15 CCATTACTTGGAAATCTATTTAAAACCTCTTTAATCTT 

The match between the T2 sequence and the C1/C2 sequence is 

CAACTAACAACCGTATCGAATTTACCATTACTTGGAAATCTATTTAAAA.ee 
20 TCTTTAATCTTGTGATAATAAATTCTAATCGATTCGTG 

A C1/C2 short loop on chromosome 1 whose identifier is 1448 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene MJ1401 and has the DNA sequence 

25 

TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTA 

CCATTACTTGGAAATCTATTTAAAACCTCTTTAATCTTATGATAATAAATT 

CTAATCGATTCGTGACTTAT 

30 The match between the Tl sequence and the C1/C2 sequence is 
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TTATAGAACATTATGAAGCTTTTTACTCAACTAACAACCGTATCGAATTTA 
CCATTACTTGGAAATCTATTTAAAACCTCTTTAATCTTATGATA 

The match between the T2 sequence and the C1/C2 sequence is 

5 

CAACTAACAACCGTATCGAATTTACCATTACTTGGAAATCTATTTAAAACC 
TCTTTAATCTT 



Example of a single-celled self-limiting connectron - S. cervesiae 

In this example the existence of the T1-T2 (293-320) long loop is controlled by 
C1/C2 (298) short loop. The expression of this C1/C2 short loop is controlled by the 
15 existence of the T1-T2 (293-320) long loop. The existence of this T1-T2 long loop is 

itself determined by the expression of the C1/C2 (298) short loop. The C1/C2 (298) 
short loop is the self-limiting connectron. 

298 Chromosome 2 

I 

* * 

Chromosome 2 | 
320 

298 | 
25 



A double stranded DNA loop of length 38.470 kilo-bases on chromosome 2 is 
bounded on the left by a Tl sequence whose identifier is 293. This Tl control 
30 element has the DNA sequence 

GAATTGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTA 

TCAATATATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAA 

GCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAAT 



20 



293 
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AGGATAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGT 

ATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTTGAGGAGAAC 

TTCTAGT 

5 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 320. This T2 control element has the DNA sequence 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCTC 
GAGGAGAACTTCTAGTATATTCTGTA 

10 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

YBL005W-B TS(AGA)B YBL004W. YBL003C YBL002W YBL001C 
15 YBR001C YBR002C YBR003W YBR004C YBR005W YBR006W 

YBR007C YBR008C YBR009C YBR010W YBR011C YBR012C 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

20 

A C1/C2 short loop on chromosome 2 whose identifier is 298 controls the expression 
of the genes of one or more other T1/T2 long loops. This C1/C2 short loop is 
expressed as a RNA single strand that is 3'UTR to the gene YBL005W-B and has the 
DNA sequence 

25 

ATCTATTACATTATGGGTGGTATGTTGGAATAAAAATCCACTATCGTCTAT 
CAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGAT 
GACATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCA 
AGGATTGATAATGTAATAGGATAATGAAACATATAAAACGGAATGAGGA 
30 ATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCT 
ATATCCTTGAGGAGAACTTCTAGTATATTCTGTATACCTAATATTATAGCC 
TTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

5 A C1/C2 short loop on chromosome 2 whose identifier is 298 controls the expression 

of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed as a RNA 
single strand that is 3TJTR to the gene YBL005W-B and has the DNA sequence 

ATCTATTACATTATG.GGTGGTATGTTGGAATAAAAATCCACTATCGTCTAT 
1 0 CAACTAATAGTTATATTATCAATATATTATCATATACGGTGTTAAGATGAT 
GACATAAGTTATGAGAAGCTGTCATCGAAGTTAGAGGAAGCTGAAGTGCA 
AGGATTGATAATGTAATAGGATAATGAAACATATAAAACGGAATGAGGA 
ATAATCGTAATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCT 
ATATCCTTGAGGAGAACTTCTAGTATATTCTGTATACCTAATATTATAGCC 
15 TTTATCAACAATGGAATCCCAACAATTATCTCAACATTC 

The match between the Tl sequence and the C1/C2 sequence is 

TGTTGGAATAAAAATCCACTATCGTCTATCAACTAATAGTTATATTATCAA 
20 TATATTATCATATACGGTGTTAAGATGATGACATAAGTTATGAGAAGCTG 
TCATCGAAGTTAGAGGAAGCTGAAGTGCAAGGATTGATAATGTAATAGGA 
TAATGAAACATATAAAACGGAATGAGGAATAATCGTAATATTAGTATGTA 
GAAATATAGATTCCATTTTGAGGATTCCTATATCCTTGAGGAGAACTTCTA 
GT 

25 

The match between the T2 sequence and the C1/C2 sequence is 

AATATTAGTATGTAGAAATATAGATTCCATTTTGAGGATTCCTATATCCT 
30 

Example of a multi-celled self-limiting connectron - C. elegans 
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In this example the existence of the T1-T2 (293-320) long loop is controlled by 
C1/C2 (298) short loop. The expression of this C1/C2 short loop is controlled by the 
existence of the T1-T2 (293-320) long loop. The existence of this T1-T2 long loop is 
5 itself determined by the expression of the C1/C2 (298) short loop. The C1/C2 (298) 

short loop is the self-limiting connectron. 

17155 Chromosome 4 

io 1 

| Chromosome 4 | 

17154 17190 
I 17155 | 



15 



20 



25 



30 



A double stranded DNA loop of length 89.919 kilo-bases on chromosome 4 is 
bounded on the left by a Tl sequence whose identifier is 17154. This Tl control 
element has the DNA sequence 

AAATTTCCGGCAAATCGGCAAACTGGCAA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 17190. This T2 control element has the DNA sequence 

AATTTGCCGATTTGCCGAATTTGTCGACA 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following genes 

R08C7.ll M01H9.2 M01H9.3 M01H9.4 M01H9.1 ZK180.1 ZK180.2 
ZK180.3 ZK180.4 ZK180.5 ZK180.6 ZK185.3 ZK185.2 
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This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 4 whose identifier is 17155 controls the 
5 expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 

loop is expressed as a RNA single strand that is 3TJTR to the gene R08C7.1 and has 
the DNA sequence 

AAATITCCGGCAAATCGGCAAACTGGCAA1TTGCCGATTTGCCGAATTTGT 
10 CGACA 

A C1/C2 short loop on chromosome 4 whose identifier is 17171 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop is expressed as a RNA single strand that is 3'UTR to the gene ZK 180.2 and has 
15 the DNA sequence 

TGGAAATTTCAGAATTTCAATTTTAATCGGCAAAATTGT^ 
AATTT 

20 The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 

short loops. 

A C1/C2 short loop on chromosome 4 whose identifier is 17155 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
25 as a RNA single strand that is 3'UTR to the gene R08C7.1 and has the DNA sequence 

AAATTTCCGGCAAATCGGCAAACTGGCAAT^ 
CGACA 

30 The match between the Tl sequence and the C1/C2 sequence is 

AAATTTCCGGCAAATCGGCAAACTGGCAA 
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AATTTGCCGATTTGCCGAATTTGTCGACA 
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9. Geneless connectrons exist in single-celled and multi-celled eukaryotes 

Normally T1-T2 long loops contain genes whose expression is regulated by the 
existence of the long loop. When a T1-T2 long loop does not contain any genes it is 
described as being "geneless". The existence of the T1-T2 long loop is itself 
controlled by one or more C1/C2 short loops that may be on the same or different 
chromosomes. The geneless T1-T2 long loops must contain one or more C1/C2 short 
loops. 

Example of a single-celled geneless connectron- S. cervesiae 

In this example the existence of the T1-T2 (1537-1559) long loop is controlled by 
three C1/C2 (3789, 5289 and 5753) short loops. The expression of 21 C1/C2 (1538 
through 1558) short loops are controlled by the existence of the T1-T2 (1537-1559) 
long loop. 

3789 Chromosome 9 
5289 Chromosome 12 
5753 Chromosome 13 
I 

* * * 

| Chromosome 4 j 

1537 1559 
| 1538 through 1558 | 



A double stranded DNA loop of length 4.825 kilo-bases on chromosome 4 is bounded 
on the left by a Tl sequence whose identifier is 1537. This Tl control element has 
the DNA sequence 

ATGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAA 
AGGCTATAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGAT 
TTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTATAAATAT 
TATrATCATCGTTTTATATGTTAATATTCATTGATCCTATTACATTATCAAT 
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CCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGT 
CATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACTAGTTA 
GTAGATGATAGTTGATTTTTATTCCAACATACCACCCATAATGTAATAGAT 
CTAAT 

5 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1559. This T2 control element has the DNA sequence 

ATGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAA 
10 AGGCTATAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGAT 
TTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTATAAATAT 
TATTATCATCGTTTTATATGTTAATATTCATTGATCCTATTACATTATCAAT 
CCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGT 
CATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACTAGTTA 
15 GTAGATGATAGTTGATTTTTATTCCAACATACCACCCATAATGTAATAGAT 
CTAAT 

There are no genes controlled by this T1/T2 loop. 

20 This long T1/T2 double stranded DNA loop modulates the expression of the 

following C1/C2 short loops 

t 

A C1/C2 short loop on chromosome 4 whose identifier is 1538 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
25 loop has the DNA sequence 

ATGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAA 
AGGCTATAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGAT 
TTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTATAAATAT 
30 TATTATCATCGTTTTATATGTTAATATTCATTGATCCTATTACATTATCAAT 
CCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGT 
CATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACTAGTTA 
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GTAGATGATAGTTGATTTTTATTCCAACATACCACCCATAATGTAATAGAT 
CT AATG AATCC ATTTGTTTGTTAATAGTTT 

This T1-T2 loop also modulates the C1/C2 short loops numbered 1539 to 1557 

5 

A C1/C2 short loop on chromosome 4 whose identifier is 1558 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

AGCTTCTCATAACTTATGTCATCATCTTAACACCGTATATGATAATATATT 
GATAATATAACTTGTTGGAATAAAAATCAACTATCATCTACTAACTAGTAT 
TTACGTTACTAGTATATTATCATATACGGTGTTAGAAGATGACGCAAATG 
ATG AG AAATAGTC ATCTAAATTAGTGGAAGCTGA. . .GTCTATCTGGCG AAT 
ATAAATTTTTACGCTACACACGTCATCGACATCTAAATATGACAGTCGCTG 
AACTGTTCTTAGATATCCATGCTATTTATGAAGAACAACAGGGATCGAGA 
AACAG 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 
20 

A C1/C2 short loop on chromosome 9 whose identifier is 3789 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene YIL059C and has the DNA 
sequence 

25 

TTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCA 
GCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACA 
CCGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAG 
TTGATTTTTATTCCAACAGTAT 

30 

The match between the Tl sequence and the C1/C2 sequence is 
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TTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCA 
GCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACA 
CCGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAG 
TTGATTTTTATTCCAACA 

5 

The match between the T2 sequence and the C1/C2 sequence is 

TTTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCA 
GCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACA 
10 CCGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAG 
TTGATTTTTATTCCAACA 

A C1/C2 short loop on chromosome 12 whose identifier is 5289 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
15 as a RNA single strand that is 3'UTR to the gene YLR301W and has the DNA . 

sequence 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATA 
TTAGGTATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCAT 
20 AAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTT 
TTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAG 
CTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACAC 
CGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGT 
TGATTTTTATTCCAACAC 

25 

The match between the Tl sequence and the C1/C2 sequence is 

AGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAA 
TCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTA 
30 ATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAA 
TTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGAT 
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aatatactagtaacgtaaatactagttagtagatgaTagttgatttttatt 

CCAACA 

The match between the T2 sequence and the C1/C2 sequence is 

5 

AGGATTTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTATA 
AATATTATTATCATCGTTTTATATGTTAATATTCATTGATCCTATTACATTA 
TCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATT 
TGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACT 
10 AGTTAGTAGATGATAGTTGATTTTTATTCCAACA 

A C1/C2 short loop on chromosome 13 whose identifier is 5753 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3UTR to the gene YMR044W and has the DNA 
15 sequence 

TTGAGAAATGGGGGAATGTTGAGATAATTGTTGGGATTCCATTGTTGATA 
AAGGCTATAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCAAGGA 
TATAGGAATCCTCAAAATGGAATCTATATTTCTACATACTAATATTACGAT 
20 TATTCCTCATTCCGTTTTATATGTTTCATTATCCTATTACATTATCAATCCT 
TGCACTTCAGCTTCCTCTAACTTCGATGACAGCTTCTCATAACTTATGTCA 
TCATCTTAACACCGTATATGATAATATATTGATAATATAACTATTAGTTGA 
TAGACGATAGTGGATTTTTATTCCAACAT 

25 The match between the Tl sequence and the C1/C2 sequence is 

AGATAATTGTTGGGATTCCATTGTTGATAAAGGCTATAATATTAGGTATAC 
AGAATATACTAGAAGTTCTCCTC 

30 The match between the T2 sequence and the C1/C2 sequence is 
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TTGTTGGGATTCCATTGTTGATAAAGGCTATAATATTAGGTATACAGAATA 
TACTAGAAGTTCTCCTCAAGGAT 



Two examples of multi-celled geneless connectrons - C. elegans 

In the first example the existence of the T1-T2 (2342-2344) long loop is controlled 
by the C1/C2 (24114) short loop. The expression of one C1/C2 (2343) short loop is 
10 controlled by the existence of the T1-T2 (2342-2344) long loop. 

241 14 Chromosome 5 



20 



25 



15 | Chromosome 1 | 

2342 2344 
I 2343 I 



In the second example the existence of the T1-T2 (29221-29262) long loop is 
controlled by the C1/C2 (24114) short loop. The expression of one C1/C2 (2343) 
short loop is controlled by the existence of the T1-T2 (2342-2344) long loop. 



4291 Chromosome 1 



* * m * 

| Chromosome 5 | 

30 29221 29262 

| 29222 through 29261 | 



35 A double stranded DNA loop of length 67.059 kilo-bases on chromosome 1 is 

bounded on the left by a Tl sequence whose identifier is 2342. This Tl control 
element has the DNA sequence 
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TGAAAACTACAGTAATTCTTTAAATGACTACTGTAGC 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 2344. This T2 control element has the DNA sequence 

CTACTGTAGCGCTTGTGTCGATTTACGGGCTCGATTT 

There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 2343 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

TCGACACAAGCGCTACAGTAGCTATTTAAAGAATTACTGTAGTTTTCGCT^ 
CGAGATATTT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 5 whose identifier is 24114 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3 f UTR to the gene C13F10.5 and has the DNA 
sequence 

GCGAAAACTACAGTAATTCTTTAAATGACTACTGTAGCGCTTGTGTCGATT 
TACGGGCTCGATTTTCG 

The match between the Tl sequence and the C1/C2 sequence is 
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GAAAACTACAGTAATTCTTTAAATGACTACTGTAGC 
The match between the T2 sequence and the C1/C2 sequence is 
CTACTGTAGCGCTTGTGTCGATTTACGGGCTCGATTT 



A double stranded DNA loop of length 4L297 kilo-bases on chromosome 5 is 
bounded on the left by a Tl sequence whose identifier is 29221. This Tl control 
element has the DNA sequence 

TTTAAATTTCCCGCCAAAAATTGACTGAAA 
ATTGACAGAAA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 29262. This T2 control element has the DNA sequence 

TGAAAATTTGAATTTCCCGCCAAAAATTAAC 

There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following C1/C2 short loops 

A C1/C2 short loop on chromosome 5 whose identifier is 29222 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

AATTTCCCGCCAAAAATTGACTGAAAA 
ACAGAAA 
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This T1-T2 loop also modulates the C1/C2 short loops numbered 29223 to 29260 

A C1/C2 short loop on chromosome 5 whose identifier is 29261 controls the 
5 expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 

loop has the DNA sequence 

AAAATTGACTGAAAATTTGAATTTCCAGCCAAAAATTGACTGAAAATTTG 
AATT 

10 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 1 whose identifier is 4291 controls the 
15 expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3'UTR to the gene Y43F8C.5 and has the DNA 
sequence 

AAAATTAACTGAAAATTTGAATTTCCCGCCAAAAATTGACTGAAAATTTG 
20 AATTTCCCGCCAAAAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAA 
TTGACTGAAAATTTGAATTTCCCGCCAAAAATTAA1TGAAAATTTGAATTT 
CCCGCC AAAAATTAATTGAA ACTTTG AATTTTC AA. . . ATTTCCCGCC AAAA 
ATTAATTGAAACTTTGAATTTTCAAATTTCCCGCCAAAAATTGACTGAAAA 
TTTGAATTTCCCGCCAAAAATTAATTGAAAATTTGAATTTTT 
25 GCCAAAAATGACTGA 

The match between the Tl sequence and the C1/C2 sequence is 

TTTAAATTTCCCGCCAAAAATTGACTGAAAATTTG 

30 

The match between the T2 sequence and the C1/C2 sequence is 
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AAAAAAATTGACTGAAAATTTGAATTTCCCGCCAAAAATTGA 
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10. One connectron controls many geneless connectrons in single-celled and 
multi-celled eukaryotes 

One C1/C2 short loop can control the existence of many geneless T1-T2 long loops. 
Example of a single-celled geneless connectron - S. cervesiae 

In this example the existence of the three T1-T2 (1142-1156, 1242-1272 and 7102- 
7117) long loops is controlled by the C1/C2 (5289) short loop. 

5289 Chromosome 12 
I 



\ Chromosome 4 | 

15 1142 1156 

| 1143 through 1155 | 

5289 Chromosome 12 
I 

| Chromosome 4 | 

1243 1272 
| 1244 through 1271 | 

25 5289 Chromosome 12 



| Chromosome 5 | 

7102 7117 
30 | 7103 through 7116 | 



A double stranded DNA loop of length 5.337 kilo-bases on chromosome 4 is bounded 
35 on the left by a Tl sequence whose identifier is 1142. This Tl control element has 

the DNA sequence 

ATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAAT 
TATGTAGATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAG 
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GGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCATTTTATA 
TGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCC 
ACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTAT 
ATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTT 

5 TTATTCCAACA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1 156. This T2 control element has the DNA sequence 

10 TTTTAATAAGGCAATAATATTAGGTATGTAGATATACTAGAAGTTCTCCTC 
CAGGATTTAGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTAT 
AAATATTATTATCATCATTTTATATGTTAATATTCATTGATCCTATTACATT 
ATCAATCCTTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCAT 
TTGCGTCATCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATAC 

1 5 TAGTTAGTAGATGATAGTTGATTTTTATTCCAACAAGAA 

There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the expression of the 
20 following C 1 /C2 short loops 

A C1/C2 short loop on chromosome 4 whose identifier is 1143 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

25 

ATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGG 
TATGTAGATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAG 
GGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCATTTTATA 
TGTTAATATTCATTGATCCTATTACATTATCAAT. . .CTCTAAGTCTCATTGCC 
30 TTTGTGCCAAAAAATCTGTTTCTAAATTTCTCTTCATTTGTAGACTTAATTA 
TACTGATCGTTGATCTACTATCAGTAAGTAAGCCTTTAATAATTGGTTTCT 
TGTTAAGTTCTTGCACAAGGTGACTGAGGTTATTCAATAGCGG 
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This T1-T2 Loop also modulates the C1/C2 short loops numbered 1 144 to 1154 

A C1/C2 short loop on chromosome 4 whose identifier is 1155 controls the 
5 expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 

loop has the DNA sequence 

GAGGAGAACTTCTAGTATATCTACATACCTAATATTATTGCCTTATTAAAA 
ATGGAATCCCAACAATTA 

10 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 12 whose identifier is 5289 controls the 
15 expression of the genes in this Tl/TZ long loop. This C1/C2 short loop is expressed 

as a RNA single strand that is 3'UTR to the gene YLR301W and has the DNA 
sequence 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATA 
20 TTAGGTATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCAT 
AAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTT 
TTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTrGCGTTTCAG 
CTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACAC 
CGTATATGATAATATACTAGTACGTAAATACTAGTTAGTAGATGATAGTT 
25 GATTTTTATTCCAACAC 

The match between the Tl sequence and the C1/C2 sequence is 

ATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGG 
30 TATGTAGA 

The match between the T2 sequence and the C1/C2 sequence is 
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TTTTAATAAGGCAATAATATTAGGTATGTAGA 



A double stranded DNA loop of length 5.251 kilo-bases on chromosome 4 is bounded 
on the left by a Tl sequence whose identifier is 1243. This Tl control element has 
the DNA sequence 

CGTGTTTTATCTCATGTTGTTCGTTTTGTTATTGAGATATATGTGGGTAATT 

AGATAATTGTTGGGATTCCATTGTTGATAAAGGCTATAATATTAGGTATAC 

AGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAA 

TCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTTiTATATGTTA 

ATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAA 

TTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGAT 

AATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATT 

CCAACA 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 1272. This T2 control element has the DNA sequence 

TGAGATATATGTGGGTAATTAGATAATTGTTGGGATTCCATTGTTGATAAA 
GGCTATAATATTAGGTATACAGAATATACTAGAAGTTCTCCTCGAGGATTT 
AGGAATCCATAAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTA 
TT ATCATCGTTTTATATGTTAATATTC ATTG ATC . . .TATACTAGTAACGTAA 
ATACTAGTTAGTAGATGATAGTTGATTTTTATTCCAACAGTTATAAGGTTG 

TTTCATATGTGTTTTATGAA 

There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following CI /C2 short loops 
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A C1/C2 short loop on chromosome 4 whose identifier is 1244 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

TTTATCTCATGTTGTTCGTTTTGTTATTGAGATATATGTGGGTAATTAGATA 

ATTGTTGGGATTCCATTGTTGATAAAGGCTATAATATTAGGTATACAGAAT 

ATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAATCTGC 

AATTCTACACAATTCTATAAATATTATTATCAT...GTCTCGATGTAGTATAC 

GTATAAATTATTACCTGATACTTCATCTCTAAGTCTCATTGCCTTTGTGCCA 

AAAAATCTGTTTCTAAATTTCTCTTCATTTGTAGACTTAATTATACTGATCG 

TTGATCTACTATCAGTAAGT 

This T1-T2 loop also modulates the C1/C2 short loops numbered 1245 to 1270 

A C1/C2 short loop on chromosome 4 whose identifier is 1271 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

TGTTGTATCTCAAAATGAGATATGTCAGTATGACAATACGTCATCCTAAAC 

GTTCATAAAACACATATGAAACAACCTTATAACTGTTGGAATAAAAATCA 

ACTATCATCTACTAACTAGTATTTACGTTACTAGTATATTATCATATACGG 

TGTTAGAAGATGACGCAAATGATGAGAAATAGTC...CAACAATGGAATCC 

CAACAATTATCTAATTACCCACATATATCTCATGGTAGCGCCTGTGCTTCG 

GTTACTTCTAAGGAAGTCCACACAAATCAAGATCCGTTAGACGTTTCAGC 

TTCCAAAA 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 12 whose identifier is 5289 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
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as a RNA single strand that is 3'UTR to the gene YLR301W and has the DNA 
sequence 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATA 
5 TTAGGTATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCAT 
AAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTT 
TTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAG 
CTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACAC 
CGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGT 

10 TGATTTTTATTCCAACAC 

The match between the Tl sequence and the C1/C2 sequence is 

AGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAA 
1 5 TCTGCAATTCTACAC A ATTCTATAAATATTATTATCATCGTTTTATATGTTA 

ATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAA 
TTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGAT 
AATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATT 

CCAACA 

20 

The match between the T2 sequence and the C1/C2 sequence is 

AGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCATAAAAGGGAA 
TCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTTTTATATGTTA 
25 ATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAA 
TTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGAT 
AATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATT 

CCAACA 

30 
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A double stranded DNA loop of length 5.296 kilo-bases on chromosome 15 is 
bounded on the left 

by a Tl sequence whose identifier is 7102. This Tl control element has the DNA 
sequence 

5 

CATGATTAATATGACCAATCGGCGTGTGTTTTTGAAAAGTGGGTGAATTTT 
GAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATATTAGGTATGT 
AGAATGTACTAGAAGTTCTCCTCAAGGATTTAGGAATCCATGAAAGGGAA 
TCTGCAATTCTACACAATTCTATAAATATTATTATCATCATTTTATATGTTA 
10 ATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAGCTTCCACTAA 
TTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACACCGTATATGAT 
AATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGTTGATTTTTATT 

CCAACA 

15 This double stranded DNA loop is bounded on the right by a T2 control element 

whose identifier is 71 17. This T2 control element has the DNA sequence 

TGAAAAGTGGGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAG 
GCAATAATATTAGGTATGTAGAATGTACTAGAAGTTCTCCTCAAGGATTT 
20 AGGAATCCATGAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTA 
TTATCATCATTTTATATGTTAATATTCATTGATCCTATTACATTATCAATCC 
TTGCGTTTCAGCTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCA 
TCTTCTAACACCGTATATGATAATATACTAGTAACGTAAATACTAGTTAGT 
AGATGATAGTTGATTTTTATTCCAACAGTTTTATATACCTCTCTTATTTAGT 

25 ATAAGAA 

There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the expression of the 
30 following C 1/C2 short loops 
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A C1/C2 short loop on chromosome 15 whose identifier is 7103 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

AAGAACATTGCTGATGTGATGACAAAACCTCTTCCGATAAAAACATTTAA 

ACTATTAACTAACAAATGGATTCATTAGATCTATTACATTATGGGTGGTAT 

GTTGGAATAAAAATCAACTATCATCTACTAACTAGTATTTACGTTACTAGT 

ATATTATCATATACGGTGTTAGAAGATGACGCAAATGATGAGAAATAGTC 

ATCTAAATTAGTGGAAGCTGAAACGCAAGGATTGATAATGTAATAGGATC 

AATGAATATTAACATATAAAATGATGATAATAATATTTATAGAATTGTGT 

AGAATTGCAGATTCCCTTTCATGGATTCCTAAATCCTTGAGGAGAACTTCT 

AGTA 

This T1-T2 loop also modulates the C1/C2 short loops numbered 7104 to 71 15 

A C1/C2 short loop on chromosome 15 whose identifier is 7116 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

CCATTCTGTGGAGGTGGTACTGAAGCAGGTTGAGGAGAGACATGATGATG 
GTTCTCTGGAACAGCT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 12 whose identifier is 5289 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3TJTR to the gene YLR301W and has the DNA 
sequence 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATA 
TTAGGTATGTAGAATATACTAGAAGTTCTCCTCGAGGATTTAGGAATCCAT 
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AAAAGGGAATCTGCAATTCTACACAATTCTATAAATATTATTATCATCGTT 
TTATATGTTAATATTCATTGATCCTATTACATTATCAATCCTTGCGTTTCAG 
CTTCCACTAATTTAGATGACTATTTCTCATCATTTGCGTCATCTTCTAACAC 
CGTATATGATAATATACTAGTAACGTAAATACTAGTTAGTAGATGATAGT 

TG ATTTTT ATTCC AAC AC 

The match between the Tl sequence and the C1/C2 sequence is 

GGTGAATTTTGAGATAATTGTTGGGATTCCATTTTTAATAAGGCAATAATA 
TTAGGTATGTAGAAT 

The match between the T2 sequence and the C1/C2 sequence is 

GGTGAATTTTGAGATAATTGTrGGGATrCCATTTTTAATAAGGCAATAATA 
TTAGGTATGTAGAAT 



Example of a multi-celled geneless connectron - C. elegans 

In this example the existence of the three T1-T2 (1142-1156, 14840-15042 and 
15365-15627) long loops is controlled by the C1/C2 (16760) short loop. 



16760 Chromosome 4 
I 



| Chromosome 4 I 

1142 1156 
| 3103 through 31 19 | 

16760 Chromosome 4 



| Chromosome 4 | 

14840 15042 
I 14841 through 15041 | 
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16760 Chromosome 4 
I 

* * * 

| Chromosome 5 | 

15365 15627 
| 15366 through 15625 | 



A double stranded DNA loop of length 15.894 kilo-bases on chromosome 1 is 
bounded on the left by a Tl sequence whose identifier is 3101. This Tl control 
element has the DNA sequence 

CAAATCGGCAAATTGCCGGAATTGAACATTTCC 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 3 120. This T2 control element has the DNA sequence 

AAACGATTTTTCCGGCAAATCGGCAAATTGCCGGAATTGTAATTTCCGGC 
AAAT 



There are no genes controlled by this T1/T2 loop. 

25 This long T1/T2 double stranded DNA loop modulates the expression of the 

following C1/C2 short loops 

A C1/C2 short loop on chromosome 1 whose identifier is 3103 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
30 loop has the DNA sequence 

TTAAAATTTCCGGCAAATCGGCAAATTGGCAGAAATGAAACTCACGGCAA 
ATCGG 

35 This T1-T2 loop also modulates the C1/C2 short loops numbered 3104 to 31 18 

- 183 - 



BNSDOCID: <WO 01 94542A2_I_> 



WO 01/94542 



PCT/US01/16471 



A C1/C2 short loop on chromosome 1 whose identifier is 3119 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 



ATTTTCCAAAT 

* 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 4 whose identifier is 16760 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3UTR to the gene T23E1.2 and has the DNA sequence 

GGCAAATTGCCGAAATTGAACATTTCCGGCAAATCGGCAAATTGCCGGAA 
TTGAACATTTCCGGCAAATCGGCAAATTGCCGGAATTGAACATTTCCGGC 

AAATCGGCAAATTGCCGGAATTGA 

The match between the Tl sequence and the C1/C2 sequence is 

CAAATCGGCAAATTGCCGGAATTGAACATTTCC 

The match between the T2 sequence and the C1/C2 sequence is 

TTTCCGGCAAATCGGCAAATTGCCGGAATTG 



A double stranded DNA loop of length 86.977 kilo-bases on chromosome 3 is 
bounded on the left by a Tl sequence whose identifier is 14840. This Tl control 
element has the DNA sequence 
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AAAAATTTCCGGCAAGTCGGCAATTTTCCGAAAATGAAAATTTCCGGCAA 
ATCGGCAAATTGCCGGAATTGAAAATTCCTGGCAAATCAGCAAATTTGCG 
GCAAATCGGCAATTrGCCGAAAATGAAAATTTCCGGCAAAT 

5 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 15042. This T2 control element has the DNA sequence 

CAAATCGGTAGGTAAATTGGCCAAACTTGAAAATTTCCGGCAAATCGGCA 
10 AATTCCGCGAACTGAACATTTCCGGCAAATCGGCAAATTGCTCGAACT 

There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the expression of the 
1 5 following C 1/C2 short loops 

A C1/C2 short loop on chromosome 3 whose identifier is 14841 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

20 

AAAAATTTCCGGCAAGTCGGCAATTTTCCGAAAATGAAAATTTCCGGCAA 
ATCGGCAAATrGCCGGAATTGAAAATTCCTGGCAAATCAGCAAATTTGCG 
GCAAATCGGCAATTTGCCGAAAATGAAAATTTCCGGCAAAT 

25 This T1-T2 loop also modulates the C1/C2 short loops numbered 14842 to 15040 

A C1/C2 short loop on chromosome 3 whose identifier is 15041 controls the 
expression of the genes, of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 



30 



CGGCAATTGCCGTTCGGCAATTTGCCAATTTGCCGGAAATTTTCAATTCCG 
GCAA 
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The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 

A C1/C2 short loop on chromosome 4 whose identifier is 16760 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene T23E1 .2 and has the DNA sequence 

GGCAAATTGCCGAAATTGAACATTTCCGGCAAATCGGCAAATTGCCGGAA 
TTGAACATTTCCGGCAAATCGGCAAATTGCCGGAATTGAACATTTCCGGC 

AAATCGGCAAATTGCCGGAATTGA 

The match between the Tl sequence and the C1/C2 sequence is 

ATTTCCGGCAAATCGGCAAATTGCCGGAATTGAA 

The match between the T2 sequence and the C1/C2 sequence is 
TGAACATTTCCGGCAAATCGGCAAATTGC 



A double stranded DNA loop of length 98.488 kilo-bases on chromosome 3 is 
bounded on the left by a Tl sequence whose identifier is 15365. This Tl control 
element has the DNA sequence 

AAAATTTCCGGCAAATCGGCAATTTGCCAAAAATTGAAATTTCCGGCAAA 

TCGGCAATTTGTCAAAAATGAAAATTTCCGGCAAATCGGCAAATTGCCGA 

AAATGAAAATTTCCGGCAAATCGGCAAACTTCCGGAACTGAAAATTTCCG 

GCAAATCGGCAATTTGCCATAAATGAACATTTCCGG...GGCGAAAATTAAA 

ATTTCCGCCATATCGGCAATTTGCCAAAAAATTAAAATTTCCGGCAAATC 
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GGCAAATTGCCGGAATTCAAAATTTCCGGCAAACCGGCAAATTGCCGGAA 
CTCAAAATTCCCGGCAAATCAGCAAATTGCCGGAATT 

This double stranded DNA loop is bounded on the right by a T2 control element 
whose identifier is 15627. This T2 control element has the DNA sequence 

TGGCAAACCGGCAAATTGCCGGAATTGAACATTTCCGGCAAATCGGCAAT 
TTGCCGGAATTGAAATTT 

There are no genes controlled by this T1/T2 loop. 

This long T1/T2 double stranded DNA loop modulates the expression of the 
following CI/C2 short loops 

A C1/C2 short loop on chromosome 3 whose identifier is 15366 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

TGCCGATTTGCCGGAAATTTTCATTTTCGGCAATTTGCCGATTTGCCGGAA 
ATTTTCATT 

This T1-T2 loop also modulates the C1/C2 short loops numbered 15366 to 15624 

A C1/C2 short loop on chromosome 3 whose identifier is 15625 controls the 
expression of the genes of one or more other T1/T2 long loops. This C1/C2 short 
loop has the DNA sequence 

TCAAGCAAATTGTCAAATTCGCGGAACTAAACATTTCCGGCAAATCGGCA 
AATT 

The expression of genes in this T1/T2 long loop is controlled by the following C1/C2 
short loops. 
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A C1/C2 short loop on chromosome 4 whose identifier is 16760 controls the 
expression of the genes in this T1/T2 long loop. This C1/C2 short loop is expressed 
as a RNA single strand that is 3'UTR to the gene T23E1 .2 and has the DNA sequence 

GGCAAATTGCCGAAATTGAACATTTCCGGCAAATCGGCAAATTGCCGGAA 
TTGAACATTTCCGGCAAATCGGCAAATTGCCGGAATTGAACATTTCCGGC 

AAATCGGCAAATTGCCGGAATTGA 

The match between the Tl sequence and the C1/C2 sequence is 

ATTTCCGGCAAATCGGCAAATTGCCGGAATT 

The match between the T2 sequence and the C1/C2 sequence is 
CGGCAAATTGCCGGAATTGAACATTTCCGGCAAATCGGCAA 
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Claims 

What is claimed is: 



1. A method of identifying DNA sequences that control the expression of 
different collections of genes in a genome comprising, detecting selected DNA 
sequences adjacent to some genes excluding exons and introns. 

2. A method of identifying DNA sequences that control the expression of 
different collections of genes comprising, detecting, by computer, one or more pairs 
of non-adjacent DNA sequences to which are bound to two RNA sequences. 

3. A method of identifying DNA sequences that control the expression of 
different collections of genes in a genome comprising detecting changes in 
connectron behavior in the genome. 

4. A method of modifying the expression of different gene collections in a 
genome, comprising detecting changes in connectron behavior as a result of an 
exogenous stimulus. 



5. A method of detecting where and when new genes are being integrated into a 
host genome comprising detecting the connections in said host genome. 

6. A method of detecting the expression effect of different gene collections in a 
given body comprising detecting the back and forth flow of connectrons between the 
chromosomes thereof. 

7. A method of modifying a given body comprising modifying the connectron 
organization therein. 
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8. A method of detecting connectron control and target sequences in a given 
genome comprising: 

determining the base composition of said genome, 

determining one or more sites of control sequence organization, and/or 

determining one or more sites of target application. 

9. A method of determining the response of a cell in any tissue to changes in the 
cell's environment and/or genetic composition comprising providing a complete 
genomic DNA sequence for the organism and determining the effect of changes in 
connectrons due to application of a given exogenous stimulus to the gnome. 

10. In prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes, 
the tetradic relationship T1=C1 and T2=C2 where Tl and T2 are DNA sequences 20 
or more bases in length, where the CI sequence is adjacent to the C2 sequence, where 
the Tl and T2 sequences are on the same chromosome, and where the C1/C2 
sequences are on the same chromosome as Tl and T2 or where the C1/C2 sequences 
are on a chromosome different from J I and T2, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 



- 190- 



BNSDOCIO: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

11. In prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes, 
the connectron relationship that permits many different C1/C2 short loops to control 
the existence of a T1-T2 long loop and wherein said C1/C2 short lops can be on the 
same chromosome or on different chromosomes from the T1-T2 long loop, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 540 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. . 
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12. In prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes, 
the connectron relationship that permits one CI/C2 short loop to control the existence 
of many T1-T2 long loops, the C1/C2 short loop can be on the same chromosome or 
on different chromosomes from the T1-T2 long loops, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same, chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

,13. The connectron relationships between prokaryotes and their plasmids wherein 
said connectrons implement a control mechanism between the two genomes that 
makes it possible from them to form a symbiotic relationship, and in the case of D. 
radiodurans the relationship is not symmetric, and the D. radiodurans genome sends 
C1/C2 short loops to the MP1 plasmid, wherein: 
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CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

14. The connectron relationships that exist in plant and higher animals. 

15. In prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes, 
the connectron relationship that permits one C1/C2 short loop to control the existence 
of one or more T1-T2 long loops without being subject to any expression controls 
other than those of the gene to which the C1/C2 is 3'UTR, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 
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C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about Ikb and 105kb apart, 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart, and 

3TJTR - untranslated 3* end of an mRNA is beyond the end of the last exon, a 
stop codon in the mRNA causes the ribosome to stop the translation of mRNA 
into protein. 

16. In prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes, 
the connectron relationship that permits one C1/C2 short loop to control the existence 
of one or more T1-T2 long loops such that this C1/C2 short loop is itself subject to 
expression control by another T1-T2 long loop which surrounds it, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 
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C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

17. In prokaryotes, archea, single-celled eukaryotes and multi-celled eukaryotes, 
the connectron relationship that permits one C1/C2 short loop to control the existence 
of the T1-T2 long loop that surrounds it, wherein: 



CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 



C1/C2 - any positive or negative strand DNA sequence of 50 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 
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T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

18. The connectron relationship that do not have any genes within the T1-T2 long 
loop, wherein: 

Tl sequence is any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, and 

T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, and the T2 or Tl 
sequences must be between about lkb and 105kb apart. 

19. The geneless connectron relationship where one C1/C2 short loop controls the 
existence of many geneless T1-T2 long loops, wherein: 

CI sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the C2 sequence must occur in the same chromosome as the CI 
sequence, 

C2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more, the CI sequence must occur in the same chromosome as the C2 
sequence, 

C1/C2 - any positive or negative strand DNA sequence of 40 or more bases 
such that the CI sequence is adjacent to the C2 sequence, 

Tl sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the T2 sequence, the Tl and T2 
sequences must be between about lkb and 105kb apart, and 
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T2 sequence - any positive or negative strand DNA sequence of 20 bases or 
more that is on the same chromosome as the Tl sequence, the T2 or Tl 
sequences must be between about lkb and 105kb apart. 



197. 



BNSDOCID: <WO 0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



1/26 



Level 1 SSSSi. / ?^>0^0^?^ i 



Level 2 



Level 4 



Level 5 



"beads-on-a-string" 
form of chromatin 



11 nm 



■ r% 30-nm chromatin 

LeVel 3 fiber of 

packed nucleosomes 



section of 
chromosome in an 
extended form 



condensed section 
of metaphase 
chromosome 




eg u go y 



700 nm 



1 



Level 6 



entire 

metaphase 

chromosome 



T 

1400 nm 

1 



Figure 1 



BNSDOCID: <WO. 



.0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



2/26 




nucfeosome 



sequence-specific 
ONA-binding 
proteins 




(b) 



Figure 2 



BNSDOCID: <WO 0194542A2_L> 



WO 01/94542 



PCT/US01/16471 



3/26 



looped domain 




proteins forming a chromosome axis 



Figure 3 



BNSDOCID: <WQ 0194S42A2 I > 



WO 01/94542 



PCT/US01/16471 



4/26 



Chromosome K 
5kb tolOOkb 

T1 Geneb Gene c Gened T2 
Chromosome L 



Double Strand DNA 



Double Strand DNA 



Gene a C1 C2 



(a) Transcription and Editing 



Single Strand RNA 



C1 C2 



i 



(b) Movement of the RNA 
through the Nucleus 




30NM Particles 



0 to 100 bp 
between CI & C2 



Two Triple Strand Hoogsteen Helices 
(c) Connectron Formation 



Figure 4 



BNSDOCID: <WO 0194542A2J_> 




8NSDOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



6/26 



I vol CO 

CM <D 



_ CO 

cm >n 



»- CM 
CD CO 



I ° CO 



BNSDOCID: <WO 0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



7/26 



tf> 



ii 



CP i 

in 

CM 



IS 



CM 

CD 

O) 

CO 

Q. 

■ 

CD 



co o I 

*- CO ' 
*- CO I 



II 



1 



BNSOOCID: <WO_ 



_0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



I 



g w 

o 

8 - 



8/26 



cn cn cn in 

rH CO CN CN 



V© CO cn 
ro 



o in 

ro 



vo r-» 

rl P* N 



in co 

CN CN 



CO 
CM 



cn 

CO 



ro 



CN 



in 
cn 



o ro 
ro cn 



o 
o 



cn ro 
cm 



m co 

r» rH 
f-l 



co vo vo in vo vo 
ro ro iH ro 



cn in 



^t 1 co 



r-> id 

VO CM 



CO 

in 



m 

CN 



CO vo 



ro 



o 
wo 



in cn 



o 
o 



at 
in 



vo vo t-n in 
cn in r*- cn 



rH rH CN 



cn t** in CN 



cn cn t-* co cn 
rH cn ro ro rH 



ro in 



co cn co n 
ro in co 



r- r- ^ r*» 
m cn ^ m 

CN 



ro CN 



cn cn 



in 

CN 



ro 
co 



i-H o 
rH 



<N 



VO 
CO 



o in «s* cn «-h 
cn co rH r» cn 



in ro 

ro rH in 



r- o 
in e'- 



en cn vo 
ro 



ro ro 
cn r-» 



^cNCNcocnr-^JSEliS 
THinrH^rooro inrocN 

t-H i-» fH " 

vd ro in rH o ro o co 

^ rH c^r-IVOVDrHCNVO 



CN lO 

r- vo 



^ O CN 



cn co co vo o 

CN VO I*- VO i— I 



cn vo r- 
© ^ 



ro vo 



CM 
CO 



vo 

rH 



0) 

£ CN 
O O 

W \ 

O rH 

§° 

VI UH 

o 



BNSDOCID: <WO_ 



_0194542A2_L> 



WO 01/94542 PCT/US01/16471 




BNSDOCID: <WO. 



.0194542A2_!_> 



WO 01/94542 



PCT/US01/16471 



10/26 



o 
3 



Q 

0) 

E 

O) 

S 



(0 



CO 




seouonbas »o aequinN 



BNSDOCID: <WO. 



_0194542A2_I_> 



WO 01/94542 PCT/US01/16471 




Figure 8 



BNSDOCID: <WO 0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



12/26 



Process Flow 







5 

Process < 
Into Bit 
Fragmc 


l 

3enome 
j eking 
mt File 



Compute the 
Connectrons for a 
Genome 



Analyze Possible 
Connect ron 




Figure 9 



BNSDOCIO: <WO 0194542A2_L> 



WO 01/94542 



PCT/US01/16471 



13/26 




2 



tnltHIz* 08P-8iz» 
tot 



WindowSta* to 
DBP-Sto " 3 



JL 

4 




BNSDOCID: <WO 0194542A2_L> 



WO 01/94542 



PCT/US01/16471 



14/26 





18 


iniaauz* 
Currant-Elamanl-Count 




toO 



Inlllaliz* 
Currant*Elamant 
too 



Raad N«w-EJ«mant 




Currant-Etamant-Count 



2 


0 


Writ* Cun« 


nt-EUmtnt 


and 


Curranl-Ela 


mant-Count 



— T 

J OBP-Stea f 

*l BtocWng-Flla I 



Inlllatlza 
Cu rran I- E !• m t nt*Coun t 
toO 



23 
Sal 

Currant-Elamanl 
to Naw-Elamant 




Figure 10 -page 2 



BNSDOCID: <WO_ 



_0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



15/26 



Figure 1 1 - page 1 




4 



tnttiallM 
WMow-Sfe* 10 27 



if 



Figure 13 



Ch f o m oto n w Mu cnb- to 1 



BNSDOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



16/26 



TLWlndow-PoiiUon to 1 



Figure 1 1 - page 2 



see Figure 14 



16 I 
RfidDBP-Sfe* 
Blocking FUi entry + 
for T1 -Window 




Initialize T2-Window-Posttlon to 
Tl-Wlndow-Poaltton 
+5.000 t 




T2-Wtndow.Po«i«on 




BNSDOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



17/26 




11 

Incremant 
Chromotomi-Numbsr 




Figure 1 1 - page 3 



BNSDOCID: <WO. 



0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



18/26 




If 




Sat 


numb* 


r to t 



Salact 
Positbto-Connactron 



Sat Gana*Numbar 
to 1 



Figure 12- page 1 



z 

6 

Sat Chromoaoma-Numbaf 
to Chromoaoma-Numbaf 
of Postfcls-Connactron 



3 

SaJact 
PoittoJa-Connacbon 







to 6* tnd of C1/C2 asquancs 
with Gap*Oiitanca 
otthaffandofGana 




10 

Writa out 
Paaitota-Connactron 
■a Raat-Connactron 



I 





6 


Writa description 


of tha Raal-CoruiaciiDft 


in U 


lima 


of tha P« 


•nt Claims 







BNSDOCID: <WO 0194542A2_L> 



WO 01/94542 



PCT/US01/16471 



19/26 




Figure 12- page 2 



BNSDOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



20/26 





InllliHz* 
Chromo»om»-Numb*r 
tol 



Initial!** 
Qint-Numbar 
to 1 






Lz 

6 

Increment 
Gana-Numbar 



^Q«n»-Numb«r within 
Chramoaonw 










13 

tncramant 







Figure 13 



BNSDOCID: <WO. 



0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



21/26 



\ Find DBP-Slz© Bloddng 
FUs entry (or 
T1 -Window 



13 

Initialize 



14 

Road 
Current-Element 
and Thla-lnatance 




DBP-SIzo 
Blocking Rio 



1 


8 . 


Sot Tl-Wlndc 




to Thlt- 


nitance 



Figure 14 



BNSDOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



22/26 




3 

Initialize 
T2-Wlndow-lnstonce» 
to 0 




Figure 15 



BNSDOCID: <WO. 



0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



23/26 




Figure 16 - page 1 



2 

Site* 
T1-Wlndow-lnatanca 





Sal 


J 

set 




T2-Wlndow-ln«tanca 




Sat C1-Wlndow-lnilanc« 
to Tl-Wlndow-lnatanca 




1 






r 


Sat C2-Window-lrutartca 


to T2*Window-ln«tanc» 




BNSOOCID: <WO 0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



24/26 



ii 

SatGanoma-Usag* 
mamory (or 
Tl-Wlndow. 
T2-Window, 
C1 -Window and 
C2-Window 



T2- 




Figure 16 - page 2 



BNSDOCID: <WO_ 



0194542A2_I_> 



WO 01/94542 



PCT/US01/16471 



25/26 



Scan Genome-Usage 
i memory for 
I Potentlal-Connectrons J 



21 

Initialize 
Window Position 
to 1 



17 
Select 
PotiUvo- Strand 
27-baee window 




Write out 
Po te n U al-Connectro n 



Increment window 




Figure 17 -page 1 



BNSOOCID: <WO_ 



_0194542A2J_> 



WO 01/94542 



PCT/US01/16471 



26/26 



1 





Figure 17 - page 2 



BNSDOCID: <WO 0194542A2J_> 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 
International Bureau 

(43) International Publication Date 
13 December 2001 (13.12.2001) 




PCT 



(10) International Publication Number 

WO 01/94542 A3 



(51) International Patent Classification 7 : G06F 19/00 

(21) International Application Number: PCT/US0 1/1 6471 

(22) International Filing Date: 31 May 2001 (31.05.2001) 

(25) Filing Language: English 

(26) Publication Language: English 



(30) Priority Data: 

60/208,650 
09/866,925 



2 June 2000 (02.06.2000) US 
30 May 2001 (30.05.2001) US 



(71) Applicant: GLOBAL DETERMINANTS, INC. 

[US/US]; 17800 Mill Creek Drive, Derwood, MD 
20855-1019 (US). 

(72) Inventor: FELDMANN, Richard, J.; 17800 Mill Creek 
Drive, Derwood, MD 20855-1019 (US). 



(74) Agent: ZEGEER, Jim; 801 North Pitt Street #108, 
Alexandria, VA 22314 (US). 

(81) Designated States (national): AU, CA, IL, JP, MX. 

(84) Designated States (regional): European patent (AT, BE, 
CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, 
NL, PT, SE, TR). 

Published: 

— with international search report 

(88) Date of publication of the international search report: 

18 April 2002 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations " appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



< 



^ 

IT) (54) Title: ALGORITHMIC DETERMINATION OF CONNECTIONS 

^ (57) Abstract: An algorithm has been developed to identify four DNA sequences of 20 bases or more that form a structure called 
q a connectron. Two sequences CI and C2 are expressed as RNA in the 3'UTR of some genes in many prokaryotic, archea and 

eukaryotic genomes. The other half of a connectron is two DNA sequences Tl and T2 that are lkb to 105kb apart on the same 
O chromosome. The CI sequence is identical to the Tl sequence and the C2 sequence is identical to the T2 sequence. C1/C2 and 

T1-T2 can be on different chromosomes. The C1/C2 RNA sequence of the gene transcript forms a triple-stranded Hoogsteen helix 
^ with the double-stranded Tl and T2 DNA sequences. The formation of connectrons blocks expression of genes between Tl and T2. 
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an extent that no meaningful international search can be carried out, specifically: 



Claim Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 
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This International Searching Authority found multiple inventions in this international application, as follows: 
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1 . 12SI As al1 required additional search fees were timely paid by the applicant, this international search report covers all 
searchable claims. 

As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite 
payment of any additional fee. 

3. □ As on, y some of the required additional search fees were timely paid by the applicant, this international search report 
covers only those claims for which fees were paid, specifically claims Nos.: 
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No required additional search fees were timely paid by the applicant- Consequently, this international search report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 
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Continuation of Item 4 of the first sheet: 

The title does not comply with PCT Rule 4.3 because it is longer than 7 words. The new title is as follows: 
Algorithmic determination of connectrons. 



BOX IL OBSERVATIONS WHERE UNITY OF INVENTION IS LACKING 

This application contains the following inventions or groups of inventions which are not so linked as to form a single general inventive 
concept under PCT Rule 13.1. In order for all inventions to be examined, the appropriate additional examination fees must be paid. 

Group I, claim(s) 1, drawn to a promoter detection comprising exclusion of introns and exons. 
Group II, claim(s) 2, drawn to a promoter detection method comprising detection of DNA linked to RNA. 
Group III, claim(s) 3, drawn to a promoter detection method comprising detection of connectron behavior. 

Group IV, claim(s) 4, drawn to a method of modifying expression comprising detection of changes in connectron behavior. 
Group V, claim(s) 5. drawn to a method of detecting integration by detection of connectrons. 
Group VI, claim(s) 6, drawn to a method of detecting expression of genes by detecting flow of connectrons. 

Group VII, clatm(s) 7, drawn to a method of modifying a body by modification of connectrons. 

Group VIII, claim(s) 8, drawn to a method of detection of connectrons by any of three methodsia) determining base composition, b) 
determining one of more sites of control sequence organization, and c) determining sites of target application. 

Group IX, claim(s) 9. drawn to a method of determining cell response by use of complete genome sequences and detection of changes in 
connectrons caused by stimulus to the genome. 

Group X, claim(s) 10, drawn to polynucleotides with a defined symmetry. 
Group XI, claim(s) 11, drawn to polynucleoti8des with a first connectron relationship. 
Group XII, claim(s) 12, drawn to polynucleotides with a second connectron relationship. 

Group XIII. claim(s) 13, drawn to polynucleotides with a third connectron relationship. 
Group XIV, claim(s) 14, drawn to polynucleotides with a fourth connectron relationship of plants and higher animals. 
Group XV, ciaim(s) 15, drawn to polynucleotides with a fifth connectron relationship. 

Group XVI, claim(s) 16, drawn to polynucleotides with a sixth connectron relationship. 
Group XVII, claim(s) 17, drawn to polynucleotides with a seventh connectron relationship. 
Group XVIII, claim(s) 18. drawn to polynucleotides with an eighth connectron relationship 

Group XIX, claim(s) 19. drawn to polynucleotides with a ninth connectron relationship. 

This application contains claims directed to more than one species of the generic invention. These species are deemed to lack unity of 
invention because they are not so linked as to form a single general inventive concept under PCT Rule 13. 1 . 

In order for more than one species to be examined, the appropriate additional examination fees must be paid. The species are as follows: 

Group VIII is drawn to three species of detection of connectrons comprising the alternative steps of: 1) determining base composition. 2) 
determining one or more sites of control sequence organization, and 3) determining sites of target application. 
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The claims are deemed to correspond to the species listed above in the following manner: 



The following claim(s) are generic: 8. 

The inventions listed as Groups 1-19 do not relate to a single general inveniive concept under PCT Rule 13.1 because, under PCT Rule 
13.2, they lack the same or corresponding special technical features for the following reasons: PCT Rule 13.1 and Annex B do not 
provide for unity of invention between two or more different products or methods of use that share a special technical feature. To the 
extent the groups have a special technical feature of a connectron. the groups are drawn to different methods of use or detection of 
connectrons. To the extent the groups are drawn to compositions of connectrons, the groups are drawn to compositions with different 
structures that lack a common special technical feature. 

The species listed above do not relate to a single general inventive concept under PCT Rule 13.1 because, under PCT Rule 13.2, the 
species lack the same or corresponding special technical features for the following reasons: The species listed for Group VI 1 1 are drawn 
to three different methods with different steps that produce different results. 
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